1
|
Horng H, Scott C, Winham S, Jensen M, Pantalone L, Mankowski W, Kerlikowske K, Vachon CM, Kontos D, Shinohara RT. Multivariate testing and effect size measures for batch effect evaluation in radiomic features. Sci Rep 2024; 14:13923. [PMID: 38886407 PMCID: PMC11183083 DOI: 10.1038/s41598-024-64208-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Accepted: 06/06/2024] [Indexed: 06/20/2024] Open
Abstract
While precision medicine applications of radiomics analysis are promising, differences in image acquisition can cause "batch effects" that reduce reproducibility and affect downstream predictive analyses. Harmonization methods such as ComBat have been developed to correct these effects, but evaluation methods for quantifying batch effects are inconsistent. In this study, we propose the use of the multivariate statistical test PERMANOVA and the Robust Effect Size Index (RESI) to better quantify and characterize batch effects in radiomics data. We evaluate these methods in both simulated and real radiomics features extracted from full-field digital mammography (FFDM) data. PERMANOVA demonstrated higher power than standard univariate statistical testing, and RESI was able to interpretably quantify the effect size of site at extremely large sample sizes. These methods show promise as more powerful and interpretable methods for the detection and quantification of batch effects in radiomics studies.
Collapse
Affiliation(s)
- Hannah Horng
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Department of Radiology, Center for Biomedical Image Computing and Analysis (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA.
- Penn Statistics in Imaging Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| | | | | | | | - Lauren Pantalone
- Department of Radiology, Center for Biomedical Image Computing and Analysis (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Walter Mankowski
- Department of Radiology, Center for Biomedical Image Computing and Analysis (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA
| | | | | | - Despina Kontos
- Department of Radiology, Center for Biomedical Image Computing and Analysis (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA
- Center for Innovation in Imaging Biomarkers and Integrated Diagnostics (CIMBID), Columbia University, New York, NY, 10027, USA
| | - Russell T Shinohara
- Department of Radiology, Center for Biomedical Image Computing and Analysis (CBICA), University of Pennsylvania, Philadelphia, PA, 19104, USA
- Penn Statistics in Imaging Endeavor (PennSIVE), Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| |
Collapse
|
2
|
Cheek CL, Lindner P, Grigorenko EL. Statistical and Machine Learning Analysis in Brain-Imaging Genetics: A Review of Methods. Behav Genet 2024; 54:233-251. [PMID: 38336922 DOI: 10.1007/s10519-024-10177-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 01/24/2024] [Indexed: 02/12/2024]
Abstract
Brain-imaging-genetic analysis is an emerging field of research that aims at aggregating data from neuroimaging modalities, which characterize brain structure or function, and genetic data, which capture the structure and function of the genome, to explain or predict normal (or abnormal) brain performance. Brain-imaging-genetic studies offer great potential for understanding complex brain-related diseases/disorders of genetic etiology. Still, a combined brain-wide genome-wide analysis is difficult to perform as typical datasets fuse multiple modalities, each with high dimensionality, unique correlational landscapes, and often low statistical signal-to-noise ratios. In this review, we outline the progress in brain-imaging-genetic methodologies starting from early massive univariate to current deep learning approaches, highlighting each approach's strengths and weaknesses and elongating it with the field's development. We conclude by discussing selected remaining challenges and prospects for the field.
Collapse
Affiliation(s)
- Connor L Cheek
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA.
- Department of Physics, University of Houston, Houston, TX, USA.
| | - Peggy Lindner
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA
- Department of Information Science Technology, University of Houston, Houston, TX, USA
| | - Elena L Grigorenko
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA
- Department of Psychology, University of Houston, Houston, TX, USA
- Baylor College of Medicine, Houston, TX, USA
- Sirius University of Science and Technology, Sochi, Russia
| |
Collapse
|
3
|
Kong M, Kim H, Hong T. An effective alerting strategy to facilitate occupants' perception of indoor air quality: By alarming concentration of indoor air pollution. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2023; 325:121428. [PMID: 36914153 DOI: 10.1016/j.envpol.2023.121428] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Revised: 01/27/2023] [Accepted: 03/08/2023] [Indexed: 06/18/2023]
Abstract
Previous studies have proven that it is hard for occupants to perceive concentration of indoor air pollution (IAP) and resulting indoor air quality (IAQ) on their own. Therefore, a method is needed to encourage them to turn their attention to actual IAP, in this context, alerting is thus suggested. However, previous studies pose limitations in that they failed to analyze the effects of alerting concentration of IAP on occupants' IAQ perception. To fill the research gap, this study sought to explore a proper strategy to help occupants have a clearer perception of IAQ. A one-month observational experiment was conducted on nine subjects under three scenarios with different alerting strategies. In addition, the visual distance estimation method was used to quantitatively analyze similar tendencies between the subject's perceived IAQ and concentration of IAP for each scenario. The experimental results confirmed that when an alerting notification was not sent, the occupants could not clearly perceive IAQ as the visual distance was the highest at 0.332. On the other hand, when the alerting notification whether the concentration of IAP exceeded the standard or not was sent, the occupants could perceive IAQ relatively clearly as the visual distance was reduced to 0.291 and 0.236. In conclusion, not only installing a monitoring device but also establishing proper alerting strategies on the concentration of IAP is essential to facilitate occupants' IAQ perception and protect occupants' health.
Collapse
Affiliation(s)
- Minjin Kong
- Department of Architecture and Architectural Engineering, Yonsei University, Seoul, South Korea
| | - Hakpyeong Kim
- Department of Architecture and Architectural Engineering, Yonsei University, Seoul, South Korea
| | - Taehoon Hong
- Department of Architecture and Architectural Engineering, Yonsei University, Seoul, South Korea.
| |
Collapse
|
4
|
Distance-Based Analysis with Quantile Regression Models. STATISTICS IN BIOSCIENCES 2021; 13:291-312. [DOI: 10.1007/s12561-021-09306-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
5
|
Leech CM, Flynn MJ, Arsenault HE, Ou J, Liu H, Zhu LJ, Benanti JA. The coordinate actions of calcineurin and Hog1 mediate the stress response through multiple nodes of the cell cycle network. PLoS Genet 2020; 16:e1008600. [PMID: 32343701 PMCID: PMC7209309 DOI: 10.1371/journal.pgen.1008600] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Revised: 05/08/2020] [Accepted: 01/07/2020] [Indexed: 12/19/2022] Open
Abstract
Upon exposure to environmental stressors, cells transiently arrest the cell cycle while they adapt and restore homeostasis. A challenge for all cells is to distinguish between stress signals and coordinate the appropriate adaptive response with cell cycle arrest. Here we investigate the role of the phosphatase calcineurin (CN) in the stress response and demonstrate that CN activates the Hog1/p38 pathway in both yeast and human cells. In yeast, the MAPK Hog1 is transiently activated in response to several well-studied osmostressors. We show that when a stressor simultaneously activates CN and Hog1, CN disrupts Hog1-stimulated negative feedback to prolong Hog1 activation and the period of cell cycle arrest. Regulation of Hog1 by CN also contributes to inactivation of multiple cell cycle-regulatory transcription factors (TFs) and the decreased expression of cell cycle-regulated genes. CN-dependent downregulation of G1/S genes is dependent upon Hog1 activation, whereas CN inactivates G2/M TFs through a combination of Hog1-dependent and -independent mechanisms. These findings demonstrate that CN and Hog1 act in a coordinated manner to inhibit multiple nodes of the cell cycle-regulatory network. Our results suggest that crosstalk between CN and stress-activated MAPKs helps cells tailor their adaptive responses to specific stressors. In order to survive exposure to environmental stress, cells transiently arrest the cell division cycle while they adapt to the stress. Several kinases and phosphatases are known to control stress adaptation programs, but the extent to which these signaling pathways work together to tune the stress response is not well understood. This study investigates the role of the phosphatase calcineurin in the stress response and shows that calcineurin inhibits the cell cycle in part by stimulating the activity of the Hog1/p-38 stress-activated MAPK in both yeast and human cells. Crosstalk between stress response pathways may help cells mount specific responses to diverse stressors and to survive changes in their environment.
Collapse
Affiliation(s)
- Cassandra M. Leech
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Mackenzie J. Flynn
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Heather E. Arsenault
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Jianhong Ou
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Haibo Liu
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Lihua Julie Zhu
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- Program in Bioinformatics and Integrative Biology, Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
| | - Jennifer A. Benanti
- Department of Molecular, Cell and Cancer Biology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America
- * E-mail:
| |
Collapse
|
6
|
Shinohara RT, Shou H, Carone M, Schultz R, Tunc B, Parker D, Martin ML, Verma R. Distance-based analysis of variance for brain connectivity. Biometrics 2020; 76:257-269. [PMID: 31350904 PMCID: PMC7653688 DOI: 10.1111/biom.13123] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 07/12/2019] [Indexed: 01/07/2023]
Abstract
The field of neuroimaging dedicated to mapping connections in the brain is increasingly being recognized as key for understanding neurodevelopment and pathology. Networks of these connections are quantitatively represented using complex structures, including matrices, functions, and graphs, which require specialized statistical techniques for estimation and inference about developmental and disorder-related changes. Unfortunately, classical statistical testing procedures are not well suited to high-dimensional testing problems. In the context of global or regional tests for differences in neuroimaging data, traditional analysis of variance (ANOVA) is not directly applicable without first summarizing the data into univariate or low-dimensional features, a process that might mask the salient features of high-dimensional distributions. In this work, we consider a general framework for two-sample testing of complex structures by studying generalized within-group and between-group variances based on distances between complex and potentially high-dimensional observations. We derive an asymptotic approximation to the null distribution of the ANOVA test statistic, and conduct simulation studies with scalar and graph outcomes to study finite sample properties of the test. Finally, we apply our test to our motivating study of structural connectivity in autism spectrum disorder.
Collapse
Affiliation(s)
- Russell T. Shinohara
- Department of Biostatistics, Epidemiology, and Informatics, Penn Statistics in Imaging and Visualization Center, University of Pennsylvania, Philadelphia, Pennsylvania
- Department of Radiology, Center for Biomedical Image Computing and Analytics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Haochang Shou
- Department of Biostatistics, Epidemiology, and Informatics, Penn Statistics in Imaging and Visualization Center, University of Pennsylvania, Philadelphia, Pennsylvania
- Department of Radiology, Center for Biomedical Image Computing and Analytics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Marco Carone
- Department of Biostatistics, University of Washington, Seattle, Washington
| | - Robert Schultz
- Center for Autism Research, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania
| | - Birkan Tunc
- Department of Radiology, Center for Biomedical Image Computing and Analytics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Drew Parker
- Department of Radiology, Center for Biomedical Image Computing and Analytics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Melissa Lynne Martin
- Department of Biostatistics, Epidemiology, and Informatics, Penn Statistics in Imaging and Visualization Center, University of Pennsylvania, Philadelphia, Pennsylvania
| | - Ragini Verma
- Department of Radiology, Center for Biomedical Image Computing and Analytics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania
| |
Collapse
|
7
|
Bi JH, Tong YF, Qiu ZW, Yang XF, Minna J, Gazdar AF, Song K. ClickGene: an open cloud-based platform for big pan-cancer data genome-wide association study, visualization and exploration. BioData Min 2019; 12:12. [PMID: 31391866 PMCID: PMC6595587 DOI: 10.1186/s13040-019-0202-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 06/17/2019] [Indexed: 12/15/2022] Open
Abstract
Tremendous amount of whole-genome sequencing data have been provided by large consortium projects such as TCGA (The Cancer Genome Atlas), COSMIC and so on, which creates incredible opportunities for functional gene research and cancer associated mechanism uncovering. While the existing web servers are valuable and widely used, many whole genome analysis functions urgently needed by experimental biologists are still not adequately addressed. A cloud-based platform, named CG (ClickGene), therefore, was developed for DIY analyzing of user's private in-house data or public genome data without any requirement of software installation or system configuration. CG platform provides key interactive and customized functions including Bee-swarm plot, linear regression analyses, Mountain plot, Directional Manhattan plot, Deflection plot and Volcano plot. Using these tools, global profiling or individual gene distributions for expression and copy number variation (CNV) analyses can be generated by only mouse button clicking. The easy accessibility of such comprehensive pan-cancer genome analysis greatly facilitates data mining in wide research areas, such as therapeutic discovery process. Therefore, it fills in the gaps between big cancer genomics data and the delivery of integrated knowledge to end-users, thus helping unleash the value of the current data resources. More importantly, unlike other R-based web platforms, Dubbo, a cloud distributed service governance framework for 'big data' stream global transferring, was used to develop CG platform. After being developed, CG is run on an independent cloud-server, which ensures its steady global accessibility. More than 2 years running history of CG proved that advanced plots for hundreds of whole-genome data can be created through it within seconds by end-users anytime and anywhere. CG is available at http://www.clickgenome.org/.
Collapse
Affiliation(s)
- Jia-Hao Bi
- 1School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072 China
| | - Yi-Fan Tong
- 1School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072 China
| | - Zhe-Wei Qiu
- 1School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072 China
| | - Xing-Feng Yang
- 2School of Computer Software, Tianjin University, Tianjin, 300072 China
| | - John Minna
- 3Hamon Center for Therapeutic Oncology, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA.,4Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA.,5Department of Internal Medicine, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA
| | - Adi F Gazdar
- 3Hamon Center for Therapeutic Oncology, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA.,6Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA
| | - Kai Song
- 1School of Chemical Engineering and Technology, Tianjin University, Tianjin, 300072 China.,3Hamon Center for Therapeutic Oncology, University of Texas Southwestern Medical Center, Dallas, TX 75390 USA
| |
Collapse
|
8
|
Mazur SJ, Gallagher ES, Debnath S, Durell SR, Anderson KW, Miller Jenkins LM, Appella E, Hudgens JW. Conformational Changes in Active and Inactive States of Human PP2Cα Characterized by Hydrogen/Deuterium Exchange-Mass Spectrometry. Biochemistry 2017; 56:2676-2689. [PMID: 28481111 DOI: 10.1021/acs.biochem.6b01220] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
PPM serine/threonine protein phosphatases function in signaling pathways and require millimolar concentrations of Mn2+ or Mg2+ ions for activity. Whereas the crystal structure of human PP2Cα displayed two tightly bound Mn2+ ions in the active site, recent investigations of PPM phosphatases have characterized the binding of a third, catalytically essential metal ion. The binding of the third Mg2+ to PP2Cα was reported to have millimolar affinity and to be entropically driven, suggesting it may be structurally and catalytically important. Here, we report the use of hydrogen/deuterium exchange-mass spectrometry and molecular dynamics to characterize conformational changes in PP2Cα between the active and inactive states. In the presence of millimolar concentrations of Mg2+, metal-coordinating residues in the PP2Cα active site are maintained in a more rigid state over the catalytically relevant time scale of 30-300 s. Submillimolar Mg2+ concentrations or introduction of the D146A mutation increased the conformational mobility in the Flap subdomain and in buttressing helices α1 and α2. Residues 192-200, located in the Flap subdomain, exhibited the greatest interplay between effects of Mg2+ concentration and the D146A mutation. Molecular dynamics simulations suggest that the presence of the third metal ion and the D146A mutation each produce distinct conformational realignments in the Flap subdomain. These observations suggest that the binding of Mg2+ to the D146/D239 binding site stabilizes the conformation of the active site and the Flap subdomain.
Collapse
Affiliation(s)
- Sharlyn J Mazur
- Laboratory of Cell Biology, National Cancer Institute, National Institutes of Health , Bethesda, Maryland 20892, United States
| | - Elyssia S Gallagher
- Bioprocess Measurement Group, Biomolecular Measurement Division, National Institute of Standards and Technology , Gaithersburg, Maryland 20899, United States.,Institute for Bioscience and Biotechnology Research , Rockville, Maryland 20850, United States
| | - Subrata Debnath
- Laboratory of Cell Biology, National Cancer Institute, National Institutes of Health , Bethesda, Maryland 20892, United States
| | - Stewart R Durell
- Laboratory of Cell Biology, National Cancer Institute, National Institutes of Health , Bethesda, Maryland 20892, United States
| | - Kyle W Anderson
- Bioprocess Measurement Group, Biomolecular Measurement Division, National Institute of Standards and Technology , Gaithersburg, Maryland 20899, United States.,Institute for Bioscience and Biotechnology Research , Rockville, Maryland 20850, United States
| | - Lisa M Miller Jenkins
- Laboratory of Cell Biology, National Cancer Institute, National Institutes of Health , Bethesda, Maryland 20892, United States
| | - Ettore Appella
- Laboratory of Cell Biology, National Cancer Institute, National Institutes of Health , Bethesda, Maryland 20892, United States
| | - Jeffrey W Hudgens
- Bioprocess Measurement Group, Biomolecular Measurement Division, National Institute of Standards and Technology , Gaithersburg, Maryland 20899, United States.,Institute for Bioscience and Biotechnology Research , Rockville, Maryland 20850, United States
| |
Collapse
|
9
|
Mazur SJ, Weber DP. The Area Between Exchange Curves as a Measure of Conformational Differences in Hydrogen-Deuterium Exchange Mass Spectrometry Studies. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2017; 28:978-981. [PMID: 28236290 PMCID: PMC5907500 DOI: 10.1007/s13361-017-1615-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 01/04/2017] [Accepted: 01/23/2017] [Indexed: 05/25/2023]
Abstract
Hydrogen-deuterium exchange mass spectrometry (HDX-MS) provides information about protein conformational mobility under native conditions. The area between exchange curves, A bec , a functional data analysis concept, was adapted to the interpretation of HDX-MS data and provides a useful measure of exchange curve dissimilarity for tests of significance. Importantly, for most globular proteins under native conditions, A bec values provide an estimate of the log ratio of exchange-competent fractions in the two states, and thus are related to differences in the free energy of microdomain unfolding. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Sharlyn J Mazur
- Laboratory of Cell Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD, 20892, USA.
| | | |
Collapse
|
10
|
Albrecht M, Stichel D, Müller B, Merkle R, Sticht C, Gretz N, Klingmüller U, Breuhahn K, Matthäus F. TTCA: an R package for the identification of differentially expressed genes in time course microarray data. BMC Bioinformatics 2017; 18:33. [PMID: 28088176 PMCID: PMC5237546 DOI: 10.1186/s12859-016-1440-8] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 12/21/2016] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND The analysis of microarray time series promises a deeper insight into the dynamics of the cellular response following stimulation. A common observation in this type of data is that some genes respond with quick, transient dynamics, while other genes change their expression slowly over time. The existing methods for detecting significant expression dynamics often fail when the expression dynamics show a large heterogeneity. Moreover, these methods often cannot cope with irregular and sparse measurements. RESULTS The method proposed here is specifically designed for the analysis of perturbation responses. It combines different scores to capture fast and transient dynamics as well as slow expression changes, and performs well in the presence of low replicate numbers and irregular sampling times. The results are given in the form of tables including links to figures showing the expression dynamics of the respective transcript. These allow to quickly recognise the relevance of detection, to identify possible false positives and to discriminate early and late changes in gene expression. An extension of the method allows the analysis of the expression dynamics of functional groups of genes, providing a quick overview of the cellular response. The performance of this package was tested on microarray data derived from lung cancer cells stimulated with epidermal growth factor (EGF). CONCLUSION Here we describe a new, efficient method for the analysis of sparse and heterogeneous time course data with high detection sensitivity and transparency. It is implemented as R package TTCA (transcript time course analysis) and can be installed from the Comprehensive R Archive Network, CRAN. The source code is provided with the Additional file 1.
Collapse
Affiliation(s)
- Marco Albrecht
- Complex Biological Systems Group (BIOMS/IWR), Heidelberg, Im Neuenheimer Feld 294, Heidelberg, 69120 Germany
- Systems Biology Group, Université du Luxembourg, 7, avenue du Swing, Belvaux, L-4367 Luxembourg
| | - Damian Stichel
- Complex Biological Systems Group (BIOMS/IWR), Heidelberg, Im Neuenheimer Feld 294, Heidelberg, 69120 Germany
- CCU Neuropathology Group, German Cancer Research Center (DKFZ), Im Neuenheimer Feld 221, Heidelberg, 69120 Germany
| | - Benedikt Müller
- Institute of Pathology, Heidelberg University Hospital, Im Neuenheimer Feld 672, Heidelberg, 69120 Germany
| | - Ruth Merkle
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
- Translational Lung Research Center (TLRC), Member of the German Center for Lung Research (DZL), Im Neuenheimer Feld 430, Heidelberg, 69120 Germany
| | - Carsten Sticht
- Medical Research Center, Medical Faculty Mannheim, University of Heidelberg, Theodor-Kutzer-Ufer 1-3, Mannheim, 68167 Germany
| | - Norbert Gretz
- Medical Research Center, Medical Faculty Mannheim, University of Heidelberg, Theodor-Kutzer-Ufer 1-3, Mannheim, 68167 Germany
| | - Ursula Klingmüller
- German Cancer Research Center (DKFZ), Im Neuenheimer Feld 280, Heidelberg, 69120 Germany
- Translational Lung Research Center (TLRC), Member of the German Center for Lung Research (DZL), Im Neuenheimer Feld 430, Heidelberg, 69120 Germany
| | - Kai Breuhahn
- Institute of Pathology, Heidelberg University Hospital, Im Neuenheimer Feld 672, Heidelberg, 69120 Germany
| | - Franziska Matthäus
- Complex Biological Systems Group (BIOMS/IWR), Heidelberg, Im Neuenheimer Feld 294, Heidelberg, 69120 Germany
- Frankfurt Institute for Advanced Studies (FIAS), Goethe University Frankfurt, Ruth-Moufang-Straße 1, Frankfurt am Main, 60438 Germany
| |
Collapse
|
11
|
Kayano M, Matsui H, Yamaguchi R, Imoto S, Miyano S. Gene set differential analysis of time course expression profiles via sparse estimation in functional logistic model with application to time-dependent biomarker detection. Biostatistics 2015; 17:235-48. [PMID: 26420796 DOI: 10.1093/biostatistics/kxv037] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2014] [Accepted: 08/31/2015] [Indexed: 12/17/2022] Open
Abstract
High-throughput time course expression profiles have been available in the last decade due to developments in measurement techniques and devices. Functional data analysis, which treats smoothed curves instead of originally observed discrete data, is effective for the time course expression profiles in terms of dimension reduction, robustness, and applicability to data measured at small and irregularly spaced time points. However, the statistical method of differential analysis for time course expression profiles has not been well established. We propose a functional logistic model based on elastic net regularization (F-Logistic) in order to identify the genes with dynamic alterations in case/control study. We employ a mixed model as a smoothing method to obtain functional data; then F-Logistic is applied to time course profiles measured at small and irregularly spaced time points. We evaluate the performance of F-Logistic in comparison with another functional data approach, i.e. functional ANOVA test (F-ANOVA), by applying the methods to real and synthetic time course data sets. The real data sets consist of the time course gene expression profiles for long-term effects of recombinant interferon β on disease progression in multiple sclerosis. F-Logistic distinguishes dynamic alterations, which cannot be found by competitive approaches such as F-ANOVA, in case/control study based on time course expression profiles. F-Logistic is effective for time-dependent biomarker detection, diagnosis, and therapy.
Collapse
Affiliation(s)
- Mitsunori Kayano
- Department of Animal and Food Hygiene, Obihiro University of Agriculture and Veterinary Medicine, Inada-cho, Obihiro, Hokkaido 080-8555, Japan
| | - Hidetoshi Matsui
- Faculty of Mathematics, Kyushu University, 744, Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Rui Yamaguchi
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Seiya Imoto
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Satoru Miyano
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| |
Collapse
|
12
|
Minas C, Montana G. Distance-based analysis of variance: Approximate inference. Stat Anal Data Min 2014. [DOI: 10.1002/sam.11227] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
13
|
Wang Y, Goh W, Wong L, Montana G. Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes. BMC Bioinformatics 2013; 14 Suppl 16:S6. [PMID: 24564704 PMCID: PMC3853073 DOI: 10.1186/1471-2105-14-s16-s6] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
MOTIVATION Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive. RESULTS We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer's disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity that can be directly compared to pair-wise measures of phenotypic proximity. Several known AD-related variants have been identified, including APOE4 and TOMM40. We also present experimental evidence supporting the hypothesis of a linear relationship between the number of top-ranked mutated states, or frequent mutation patterns, and an indicator of disease severity. AVAILABILITY The Java codes are freely available at http://www2.imperial.ac.uk/~gmontana.
Collapse
|
14
|
Minas C, Curry E, Montana G. A distance-based test of association between paired heterogeneous genomic data. ACTA ACUST UNITED AC 2013; 29:2555-63. [PMID: 23918252 DOI: 10.1093/bioinformatics/btt450] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Due to rapid technological advances, a wide range of different measurements can be obtained from a given biological sample including single nucleotide polymorphisms, copy number variation, gene expression levels, DNA methylation and proteomic profiles. Each of these distinct measurements provides the means to characterize a certain aspect of biological diversity, and a fundamental problem of broad interest concerns the discovery of shared patterns of variation across different data types. Such data types are heterogeneous in the sense that they represent measurements taken at different scales or represented by different data structures. RESULTS We propose a distance-based statistical test, the generalized RV (GRV) test, to assess whether there is a common and non-random pattern of variability between paired biological measurements obtained from the same random sample. The measurements enter the test through the use of two distance measures, which can be chosen to capture a particular aspect of the data. An approximate null distribution is proposed to compute P-values in closed-form and without the need to perform costly Monte Carlo permutation procedures. Compared with the classical Mantel test for association between distance matrices, the GRV test has been found to be more powerful in a number of simulation settings. We also demonstrate how the GRV test can be used to detect biological pathways in which genetic variability is associated to variation in gene expression levels in an ovarian cancer sample, and present results obtained from two independent cohorts. AVAILABILITY R code to compute the GRV test is freely available from http://www2.imperial.ac.uk/∼gmontana
Collapse
Affiliation(s)
- Christopher Minas
- Department of Imaging Sciences, Institute of Clinical Sciences, Hammersmith Campus, Statistics Section, Department of Mathematics, South Kensington Campus and Department of Surgery and Cancer, Ovarian Cancer Action Research Centre, Hammersmith Campus, Imperial College London, London W12 0NN, UK
| | | | | |
Collapse
|
15
|
Oliver JC, Tong XL, Gall LF, Piel WH, Monteiro A. A single origin for nymphalid butterfly eyespots followed by widespread loss of associated gene expression. PLoS Genet 2012; 8:e1002893. [PMID: 22916033 PMCID: PMC3420954 DOI: 10.1371/journal.pgen.1002893] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 06/26/2012] [Indexed: 12/24/2022] Open
Abstract
Understanding how novel complex traits originate involves investigating the time of origin of the trait, as well as the origin of its underlying gene regulatory network in a broad comparative phylogenetic framework. The eyespot of nymphalid butterflies has served as an example of a novel complex trait, as multiple genes are expressed during eyespot development. Yet the origins of eyespots remain unknown. Using a dataset of more than 400 images of butterflies with a known phylogeny and gene expression data for five eyespot-associated genes from over twenty species, we tested origin hypotheses for both eyespots and eyespot-associated genes. We show that eyespots evolved once within the family Nymphalidae, approximately 90 million years ago, concurrent with expression of at least three genes associated with early eyespot development. We also show multiple losses of expression of most genes from this early three-gene cluster, without corresponding losses of eyespots. We propose that complex traits, such as eyespots, may have originated via co-option of a large pre-existing complex gene regulatory network that was subsequently streamlined of genes not required to fulfill its novel developmental function. Butterfly eyespots play an essential role in natural and sexual selection, yet the evolutionary origins of eyespots and of their underlying gene regulatory network remain unknown. By scoring phenotypes and wing expression of five genes in 399 and 21 nymphalid species, respectively, we tested when eyespots and expression of their associated genes evolved. We found that the origin of eyespots was concurrent with the origin of the gene expression patterns, approximately 90 million years ago. Following this event, many genes expressed in eyespot development were lost in some lineages without a corresponding loss of eyespots, indicating substantial evolution in the cluster of genes associated with eyespots. This finding suggests that complex traits such as butterfly eyespots may initially evolve by re-deploying pre-existing gene regulatory networks, which are subsequently trimmed of genes that are unnecessary in the novel context.
Collapse
Affiliation(s)
- Jeffrey C. Oliver
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
- * E-mail: (JCO); (AM)
| | - Xiao-Ling Tong
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
| | - Lawrence F. Gall
- Yale Peabody Museum of Natural History, Yale University, New Haven, Connecticut, United States of America
| | - William H. Piel
- Yale Peabody Museum of Natural History, Yale University, New Haven, Connecticut, United States of America
| | - Antónia Monteiro
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, United States of America
- * E-mail: (JCO); (AM)
| |
Collapse
|
16
|
Ng JWY, Barrett LM, Wong A, Kuh D, Smith GD, Relton CL. The role of longitudinal cohort studies in epigenetic epidemiology: challenges and opportunities. Genome Biol 2012; 13:246. [PMID: 22747597 PMCID: PMC3446311 DOI: 10.1186/gb-2012-13-6-246] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Longitudinal cohort studies are ideal for investigating how epigenetic patterns change over time and relate to changing exposure patterns and the development of disease. We highlight the challenges and opportunities in this approach.
Collapse
|
17
|
Ng JWY, Barrett LM, Wong A, Kuh D, Smith G, Relton CL. The role of longitudinal cohort studies in epigenetic epidemiology: challenges and opportunities. Genome Biol 2012. [DOI: 10.1186/gb4029] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
|