1
|
Doncheva AI, Romero S, Ramirez‐Garrastacho M, Lee S, Kolnes KJ, Tangen DS, Olsen T, Drevon CA, Llorente A, Dalen KT, Hjorth M. Extracellular vesicles and microRNAs are altered in response to exercise, insulin sensitivity and overweight. Acta Physiol (Oxf) 2022; 236:e13862. [PMID: 36377504 PMCID: PMC9788120 DOI: 10.1111/apha.13862] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 07/11/2022] [Accepted: 07/25/2022] [Indexed: 01/29/2023]
Abstract
Extracellular vesicles induced by exercise have emerged as potential mediators of tissue crosstalk. Extracellular vesicles and their cargo miRNAs have been linked to dysglycemia and obesity in animal models, but their role in humans is unclear. AIM The aim of the study was to characterize the miRNA content in plasma extracellular vesicle isolates after acute and long-term exercise and to study associations between extracellular vesicle miRNAs, mRNA expression in skeletal muscle and adipose tissue, and cardiometabolic risk factors. METHODS Sedentary men with or without dysglycemia and overweight underwent an acute bicycle test and a 12-week exercise intervention with extensive metabolic phenotyping. Gene expression in m. vastus lateralis and subcutaneous adipose tissue was measured with RNA sequencing. Extracellular vesicles were purified from plasma with membrane affinity columns or size exclusion chromatography. RESULTS Extracellular vesicle miRNA profiling revealed a transient increase in the number of miRNAs after acute exercise. We identified miRNAs, such as miR-652-3p, that were associated to insulin sensitivity and adiposity. By performing explorative association analyses, we identified two miRNAs, miR-32-5p and miR-339-3p, that were strongly correlated to an adipose tissue macrophage signature. CONCLUSION Numerous miRNAs in plasma extracellular vesicle isolates were increased by exercise, and several miRNAs correlated to insulin sensitivity and adiposity. Our findings warrant future studies to characterize exercise-induced extracellular vesicles and cargo miRNA to clarify where exercise-induced extracellular vesicles originate from, and to determine whether they influence metabolic health or exercise adaptation.
Collapse
Affiliation(s)
| | - Silvana Romero
- Department of Molecular Cell Biology, Institute for Cancer ResearchOslo University HospitalOsloNorway
| | | | - Sindre Lee
- Department of Transplantation, Institute of Clinical MedicineUniversity of OsloOsloNorway
| | - Kristoffer J. Kolnes
- Steno Diabetes Center OdenseOdense University HospitalOdenseDenmark,Department of Physical PerformanceNorwegian School of Sport SciencesOsloNorway
| | | | - Thomas Olsen
- Department of Nutrition, Institute of Basic Medical SciencesUniversity of OsloOsloNorway
| | - Christian A. Drevon
- Department of Nutrition, Institute of Basic Medical SciencesUniversity of OsloOsloNorway
| | - Alicia Llorente
- Department of Molecular Cell Biology, Institute for Cancer ResearchOslo University HospitalOsloNorway,Department for Mechanical, Electronics and Chemical EngineeringOslo Metropolitan UniversityOsloNorway
| | - Knut Tomas Dalen
- Department of Nutrition, Institute of Basic Medical SciencesUniversity of OsloOsloNorway
| | - Marit Hjorth
- Department of Nutrition, Institute of Basic Medical SciencesUniversity of OsloOsloNorway
| |
Collapse
|
2
|
Gauran II, Xue G, Chen C, Ombao H, Yu Z. Ridge Penalization in High-Dimensional Testing With Applications to Imaging Genetics. Front Neurosci 2022; 16:836100. [PMID: 35401090 PMCID: PMC8987922 DOI: 10.3389/fnins.2022.836100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Accepted: 02/24/2022] [Indexed: 11/13/2022] Open
Abstract
High-dimensionality is ubiquitous in various scientific fields such as imaging genetics, where a deluge of functional and structural data on brain-relevant genetic polymorphisms are investigated. It is crucial to identify which genetic variations are consequential in identifying neurological features of brain connectivity compared to merely random noise. Statistical inference in high-dimensional settings poses multiple challenges involving analytical and computational complexity. A widely implemented strategy in addressing inference goals is penalized inference. In particular, the role of the ridge penalty in high-dimensional prediction and estimation has been actively studied in the past several years. This study focuses on ridge-penalized tests in high-dimensional hypothesis testing problems by proposing and examining a class of methods for choosing the optimal ridge penalty. We present our findings on strategies to improve the statistical power of ridge-penalized tests and what determines the optimal ridge penalty for hypothesis testing. The application of our work to an imaging genetics study and biological research will be presented.
Collapse
Affiliation(s)
- Iris Ivy Gauran
- Biostatistics Group, Computer, Electrical, Mathematical Sciences, and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Gui Xue
- Center for Brain and Learning Science, Beijing Normal University, Beijing, China
| | - Chuansheng Chen
- Department of Psychological Science, University of California, Irvine, Irvine, CA, United States
| | - Hernando Ombao
- Biostatistics Group, Computer, Electrical, Mathematical Sciences, and Engineering Division, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Zhaoxia Yu
- Department of Statistics, University of California, Irvine, Irvine, CA, United States
| |
Collapse
|
3
|
Bry X, Niang N, Verron T, Bougeard S. Clusterwise elastic-net regression based on a combined information criterion. ADV DATA ANAL CLASSI 2022. [DOI: 10.1007/s11634-021-00489-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
4
|
On the Use of Correlation and MI as a Measure of Metabolite-Metabolite Association for Network Differential Connectivity Analysis. Metabolites 2020; 10:metabo10040171. [PMID: 32344593 PMCID: PMC7241243 DOI: 10.3390/metabo10040171] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2020] [Revised: 04/15/2020] [Accepted: 04/22/2020] [Indexed: 02/06/2023] Open
Abstract
Metabolite differential connectivity analysis has been successful in investigating potential molecular mechanisms underlying different conditions in biological systems. Correlation and Mutual Information (MI) are two of the most common measures to quantify association and for building metabolite-metabolite association networks and to calculate differential connectivity. In this study, we investigated the performance of correlation and MI to identify significantly differentially connected metabolites. These association measures were compared on (i) 23 publicly available metabolomic data sets and 7 data sets from other fields, (ii) simulated data with known correlation structures, and (iii) data generated using a dynamic metabolic model to simulate real-life observed metabolite concentration profiles. In all cases, we found more differentially connected metabolites when using correlation indices as a measure for association than MI. We also observed that different MI estimation algorithms resulted in difference in performance when applied to data generated using a dynamic model. We concluded that there is no significant benefit in using MI as a replacement for standard Pearson's or Spearman's correlation when the application is to quantify and detect differentially connected metabolites.
Collapse
|
5
|
Abstract
A simple and fast k-medoids algorithm that updates medoids by minimizing the total distance within clusters has been developed. Although it is simple and fast, as its name suggests, it nonetheless has neglected local optima and empty clusters that may arise. With the distance as an input to the algorithm, a generalized distance function is developed to increase the variation of the distances, especially for a mixed variable dataset. The variation of the distances is a crucial part of a partitioning algorithm due to different distances producing different outcomes. The experimental results of the simple k-medoids algorithm produce consistently good performances in various settings of mixed variable data. It also has a high cluster accuracy compared to other distance-based partitioning algorithms for mixed variable data.
Collapse
|
6
|
Lee S, Olsen T, Vinknes KJ, Refsum H, Gulseth HL, Birkeland KI, Drevon CA. Plasma Sulphur-Containing Amino Acids, Physical Exercise and Insulin Sensitivity in Overweight Dysglycemic and Normal Weight Normoglycemic Men. Nutrients 2018; 11:nu11010010. [PMID: 30577516 PMCID: PMC6356487 DOI: 10.3390/nu11010010] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2018] [Revised: 12/19/2018] [Accepted: 12/19/2018] [Indexed: 12/30/2022] Open
Abstract
Plasma sulphur-containing amino acids and related metabolites are associated with insulin sensitivity, although the mechanisms are unclear. We examined the effect of exercise on this relationship. Dysglycemic (n = 13) and normoglycemic (n = 13) men underwent 45 min cycling before and after 12 weeks exercise intervention. We performed hyperinsulinemic euglycemic clamp, mRNA-sequencing of skeletal muscle and adipose tissue biopsies, and targeted profiling of plasma metabolites by LC-MS/MS. Insulin sensitivity increased similarly in dysglycemic and normoglycemic men after 12 weeks of exercise, in parallel to similar increases in concentration of plasma glutamine, and decreased concentrations of plasma glutamate, cysteine, taurine, and glutathione. Change in plasma concentrations of cysteine and glutathione exhibited the strongest correlations to exercise-improved insulin sensitivity, and expression of a cluster of genes essential for oxidative phosphorylation and fatty acid metabolism in both skeletal muscle and adipose tissue, as well as mitochondria-related genes such as mitofilin. Forty-five min of cycling decreased plasma concentrations of glutamine and methionine, and increased plasma concentrations of glutamate, homocysteine, cystathionine, cysteine, glutathione, and taurine. Similar acute responses were seen in both groups before and after the 12 weeks training period. Both acute and long-term exercise may influence transsulphuration and glutathione biosynthesis, linking exercise-improved insulin sensitivity to oxidative stress and mitochondrial function.
Collapse
Affiliation(s)
- Sindre Lee
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, 0317 Oslo, Norway.
- Department of Endocrinology, Morbid Obesity and Preventive Medicine, Oslo University Hospital; 0586 Oslo, Norway.
| | - Thomas Olsen
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, 0317 Oslo, Norway.
| | - Kathrine J Vinknes
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, 0317 Oslo, Norway.
| | - Helga Refsum
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, 0317 Oslo, Norway.
| | - Hanne L Gulseth
- Department of Endocrinology, Morbid Obesity and Preventive Medicine, Oslo University Hospital; 0586 Oslo, Norway.
- Department of Non-communicable Diseases, Norwegian Institute of Public Health; 0473 Oslo, Norway.
| | - Kåre I Birkeland
- Department of Endocrinology, Morbid Obesity and Preventive Medicine, Oslo University Hospital; 0586 Oslo, Norway.
- Institute of Clinical Medicine, Faculty of Medicine, University of Oslo; 0450 Oslo, Norway.
| | - Christian A Drevon
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, 0317 Oslo, Norway.
| |
Collapse
|
7
|
Verification of Three-Phase Dependency Analysis Bayesian Network Learning Method for Maize Carotenoid Gene Mining. BIOMED RESEARCH INTERNATIONAL 2017; 2017:1813494. [PMID: 28828382 PMCID: PMC5554554 DOI: 10.1155/2017/1813494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Accepted: 06/27/2017] [Indexed: 11/17/2022]
Abstract
Background and Objective Mining the genes related to maize carotenoid components is important to improve the carotenoid content and the quality of maize. Methods On the basis of using the entropy estimation method with Gaussian kernel probability density estimator, we use the three-phase dependency analysis (TPDA) Bayesian network structure learning method to construct the network of maize gene and carotenoid components traits. Results In the case of using two discretization methods and setting different discretization values, we compare the learning effect and efficiency of 10 kinds of Bayesian network structure learning methods. The method is verified and analyzed on the maize dataset of global germplasm collection with 527 elite inbred lines. Conclusions The result confirmed the effectiveness of the TPDA method, which outperforms significantly another 9 kinds of Bayesian network learning methods. It is an efficient method of mining genes for maize carotenoid components traits. The parameters obtained by experiments will help carry out practical gene mining effectively in the future.
Collapse
|
8
|
Voillet V, Besse P, Liaubet L, San Cristobal M, González I. Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework. BMC Bioinformatics 2016; 17:402. [PMID: 27716030 PMCID: PMC5048483 DOI: 10.1186/s12859-016-1273-5] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2016] [Accepted: 09/21/2016] [Indexed: 12/17/2022] Open
Abstract
Background In omics data integration studies, it is common, for a variety of reasons, for some individuals to not be present in all data tables. Missing row values are challenging to deal with because most statistical methods cannot be directly applied to incomplete datasets. To overcome this issue, we propose a multiple imputation (MI) approach in a multivariate framework. In this study, we focus on multiple factor analysis (MFA) as a tool to compare and integrate multiple layers of information. MI involves filling the missing rows with plausible values, resulting in M completed datasets. MFA is then applied to each completed dataset to produce M different configurations (the matrices of coordinates of individuals). Finally, the M configurations are combined to yield a single consensus solution. Results We assessed the performance of our method, named MI-MFA, on two real omics datasets. Incomplete artificial datasets with different patterns of missingness were created from these data. The MI-MFA results were compared with two other approaches i.e., regularized iterative MFA (RI-MFA) and mean variable imputation (MVI-MFA). For each configuration resulting from these three strategies, the suitability of the solution was determined against the true MFA configuration obtained from the original data and a comprehensive graphical comparison showing how the MI-, RI- or MVI-MFA configurations diverge from the true configuration was produced. Two approaches i.e., confidence ellipses and convex hulls, to visualize and assess the uncertainty due to missing values were also described. We showed how the areas of ellipses and convex hulls increased with the number of missing individuals. A free and easy-to-use code was proposed to implement the MI-MFA method in the R statistical environment. Conclusions We believe that MI-MFA provides a useful and attractive method for estimating the coordinates of individuals on the first MFA components despite missing rows. MI-MFA configurations were close to the true configuration even when many individuals were missing in several data tables. This method takes into account the uncertainty of MI-MFA configurations induced by the missing rows, thereby allowing the reliability of the results to be evaluated. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1273-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Valentin Voillet
- Université de Toulouse, INRA, INPT, INP-ENVT, UMR1388, GenPhySE, Castanet-Tolosan, F-31326, France
| | - Philippe Besse
- Université de Toulouse INSA, UMR5219 Institut de Mathématiques, Toulouse, F-31077, France
| | - Laurence Liaubet
- Université de Toulouse, INRA, INPT, INP-ENVT, UMR1388, GenPhySE, Castanet-Tolosan, F-31326, France
| | - Magali San Cristobal
- Université de Toulouse, INRA, INPT, INP-ENVT, UMR1388, GenPhySE, Castanet-Tolosan, F-31326, France.,Université de Toulouse INSA, UMR5219 Institut de Mathématiques, Toulouse, F-31077, France
| | - Ignacio González
- INRAUR875 Mathématiques et Informatiques Appliquées, F-31326, Castanet-Tolosan, France.
| |
Collapse
|
9
|
Irigoien I, Arenas C. Diagnosis using clinical/pathological and molecular information. Stat Methods Med Res 2016; 25:2878-2894. [DOI: 10.1177/0962280214534410] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
In diagnosis and classification diseases multiple outcomes, both molecular and clinical/pathological are routinely gathered on patients. In recent years, many approaches have been suggested for integrating gene expression (continuous data) with clinical/pathological data (usually categorical and ordinal data). This new area of research integrates both clinical and genomic data in order to improve our knowledge about diseases, and to capture the information which is lost in independent clinical or genomic studies. The related metric scaling distance is a not well-known, but very valuable distance to integrate clinical/pathological and molecular information. In this article, we present the use of the related metric scaling distance in biomedical research. We describe how this distance works, and we also explain why it may sometimes be preferred. We discuss the choice of the related metric scaling distance and compare it with other proximity measures to include both clinical and genetic information. Furthermore, we comment the choice of the related metric scaling distance when classical clustering or discriminant analysis based on distances are performed and compare the results with more complex cluster or discriminant procedures specially constructed for integrating clinical and molecular information. The use of the related metric scaling distance is illustrated on simulated experimental and four real data sets, a heart disease, and three cancer studies. The results present the flexibility and availability of this distance which gives competitive results.
Collapse
Affiliation(s)
- Itziar Irigoien
- Department of Computation and Artificial Intelligence, Euskal Herriko Unibertsitatea UPV-EHU, Donostia, Spain
| | - Concepción Arenas
- Departament d’Estadística, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
10
|
Jennen DGJ, van Leeuwen DM, Hendrickx DM, Gottschalk RWH, van Delft JHM, Kleinjans JCS. Bayesian Network Inference Enables Unbiased Phenotypic Anchoring of Transcriptomic Responses to Cigarette Smoke in Humans. Chem Res Toxicol 2015; 28:1936-48. [PMID: 26360787 DOI: 10.1021/acs.chemrestox.5b00145] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Microarray-based transcriptomic analysis has been demonstrated to hold the opportunity to study the effects of human exposure to, e.g., chemical carcinogens at the whole genome level, thus yielding broad-ranging molecular information on possible carcinogenic effects. Since genes do not operate individually but rather through concerted interactions, analyzing and visualizing networks of genes should provide important mechanistic information, especially upon connecting them to functional parameters, such as those derived from measurements of biomarkers for exposure and carcinogenic risk. Conventional methods such as hierarchical clustering and correlation analyses are frequently used to address these complex interactions but are limited as they do not provide directional causal dependence relationships. Therefore, our aim was to apply Bayesian network inference with the purpose of phenotypic anchoring of modified gene expressions. We investigated a use case on transcriptomic responses to cigarette smoking in humans, in association with plasma cotinine levels as biomarkers of exposure and aromatic DNA-adducts in blood cells as biomarkers of carcinogenic risk. Many of the genes that appear in the Bayesian networks surrounding plasma cotinine, and to a lesser extent around aromatic DNA-adducts, hold biologically relevant functions in inducing severe adverse effects of smoking. In conclusion, this study shows that Bayesian network inference enables unbiased phenotypic anchoring of transcriptomics responses. Furthermore, in all inferred Bayesian networks several dependencies are found which point to known but also to new relationships between the expression of specific genes, cigarette smoke exposure, DNA damaging-effects, and smoking-related diseases, in particular associated with apoptosis, DNA repair, and tumor suppression, as well as with autoimmunity.
Collapse
Affiliation(s)
- Danyel G J Jennen
- Department of Toxicogenomics, Maastricht University , Universiteitssingel 40, 6229 ER Maastricht, The Netherlands
| | - Danitsja M van Leeuwen
- Department of Toxicogenomics, Maastricht University , Universiteitssingel 40, 6229 ER Maastricht, The Netherlands
| | - Diana M Hendrickx
- Department of Toxicogenomics, Maastricht University , Universiteitssingel 40, 6229 ER Maastricht, The Netherlands
| | - Ralph W H Gottschalk
- Department of Toxicogenomics, Maastricht University , Universiteitssingel 40, 6229 ER Maastricht, The Netherlands
| | - Joost H M van Delft
- Department of Toxicogenomics, Maastricht University , Universiteitssingel 40, 6229 ER Maastricht, The Netherlands
| | - Jos C S Kleinjans
- Department of Toxicogenomics, Maastricht University , Universiteitssingel 40, 6229 ER Maastricht, The Netherlands
| |
Collapse
|
11
|
Ferraty F, Hall P. An Algorithm for Nonlinear, Nonparametric Model Choice and Prediction. J Comput Graph Stat 2015. [DOI: 10.1080/10618600.2014.936605] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
12
|
Hierarchical expression of genes controlled by the Bacillus subtilis global regulatory protein CodY. Proc Natl Acad Sci U S A 2014; 111:8227-32. [PMID: 24843172 DOI: 10.1073/pnas.1321308111] [Citation(s) in RCA: 77] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Global regulators that bind strategic metabolites allow bacteria to adapt rapidly to dynamic environments by coordinating the expression of many genes. We report an approach for determining gene regulation hierarchy using the regulon of the Bacillus subtilis global regulatory protein CodY as proof of principle. In theory, this approach can be used to measure the dynamics of any bacterial transcriptional regulatory network that is affected by interaction with a ligand. In B. subtilis, CodY controls dozens of genes, but the threshold activities of CodY required to regulate each gene are unknown. We hypothesized that targets of CodY are differentially regulated based on varying affinity for the protein's many binding sites. We used RNA sequencing to determine the transcription profiles of B. subtilis strains expressing mutant CodY proteins with different levels of residual activity. In parallel, we quantified intracellular metabolites connected to central metabolism. Strains producing CodY variants F71Y, R61K, and R61H retained varying degrees of partial activity relative to the WT protein, leading to gene-specific, differential alterations in transcript abundance for the 223 identified members of the CodY regulon. Using liquid chromatography coupled to MS, we detected significant increases in branched-chain amino acids and intermediates of arginine, proline, and glutamate metabolism, as well as decreases in pyruvate and glycerate as CodY activity decreased. We conclude that a spectrum of CodY activities leads to programmed regulation of gene expression and an apparent rerouting of carbon and nitrogen metabolism, suggesting that during changes in nutrient availability, CodY prioritizes the expression of specific pathways.
Collapse
|
13
|
Williams-DeVane CR, Reif DM, Hubal EC, Bushel PR, Hudgens EE, Gallagher JE, Edwards SW. Decision tree-based method for integrating gene expression, demographic, and clinical data to determine disease endotypes. BMC SYSTEMS BIOLOGY 2013; 7:119. [PMID: 24188919 PMCID: PMC4228284 DOI: 10.1186/1752-0509-7-119] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2012] [Accepted: 10/18/2013] [Indexed: 12/30/2022]
Abstract
Background Complex diseases are often difficult to diagnose, treat and study due to the multi-factorial nature of the underlying etiology. Large data sets are now widely available that can be used to define novel, mechanistically distinct disease subtypes (endotypes) in a completely data-driven manner. However, significant challenges exist with regard to how to segregate individuals into suitable subtypes of the disease and understand the distinct biological mechanisms of each when the goal is to maximize the discovery potential of these data sets. Results A multi-step decision tree-based method is described for defining endotypes based on gene expression, clinical covariates, and disease indicators using childhood asthma as a case study. We attempted to use alternative approaches such as the Student’s t-test, single data domain clustering and the Modk-prototypes algorithm, which incorporates multiple data domains into a single analysis and none performed as well as the novel multi-step decision tree method. This new method gave the best segregation of asthmatics and non-asthmatics, and it provides easy access to all genes and clinical covariates that distinguish the groups. Conclusions The multi-step decision tree method described here will lead to better understanding of complex disease in general by allowing purely data-driven disease endotypes to facilitate the discovery of new mechanisms underlying these diseases. This application should be considered a complement to ongoing efforts to better define and diagnose known endotypes. When coupled with existing methods developed to determine the genetics of gene expression, these methods provide a mechanism for linking genetics and exposomics data and thereby accounting for both major determinants of disease.
Collapse
Affiliation(s)
- Clarlynda R Williams-DeVane
- National Health and Environmental Effects Research Laboratory - Integrated Systems Toxicology Division, U,S, Environmental Protection Agency, Research Triangle Park, Durham, NC 27711, USA.
| | | | | | | | | | | | | |
Collapse
|
14
|
Valour D, Hue I, Grimard B, Valour B. Gene selection heuristic algorithm for nutrigenomics studies. Physiol Genomics 2013; 45:615-28. [PMID: 23632420 DOI: 10.1152/physiolgenomics.00139.2012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
Large datasets from -omics studies need to be deeply investigated. The aim of this paper is to provide a new method (LEM method) for the search of transcriptome and metabolome connections. The heuristic algorithm here described extends the classical canonical correlation analysis (CCA) to a high number of variables (without regularization) and combines well-conditioning and fast-computing in "R." Reduced CCA models are summarized in PageRank matrices, the product of which gives a stochastic matrix that resumes the self-avoiding walk covered by the algorithm. Then, a homogeneous Markov process applied to this stochastic matrix converges the probabilities of interconnection between genes, providing a selection of disjointed subsets of genes. This is an alternative to regularized generalized CCA for the determination of blocks within the structure matrix. Each gene subset is thus linked to the whole metabolic or clinical dataset that represents the biological phenotype of interest. Moreover, this selection process reaches the aim of biologists who often need small sets of genes for further validation or extended phenotyping. The algorithm is shown to work efficiently on three published datasets, resulting in meaningfully broadened gene networks.
Collapse
Affiliation(s)
- D Valour
- INRA, UMR 1198 Biologie du Développement et Reproduction, Jouy-en-Josas, France
| | | | | | | |
Collapse
|
15
|
González I, Cao KAL, Davis MJ, Déjean S. Visualising associations between paired 'omics' data sets. BioData Min 2012; 5:19. [PMID: 23148523 PMCID: PMC3630015 DOI: 10.1186/1756-0381-5-19] [Citation(s) in RCA: 202] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2012] [Accepted: 10/15/2012] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Each omics platform is now able to generate a large amount of data. Genomics, proteomics, metabolomics, interactomics are compiled at an ever increasing pace and now form a core part of the fundamental systems biology framework. Recently, several integrative approaches have been proposed to extract meaningful information. However, these approaches lack of visualisation outputs to fully unravel the complex associations between different biological entities. RESULTS The multivariate statistical approaches 'regularized Canonical Correlation Analysis' and 'sparse Partial Least Squares regression' were recently developed to integrate two types of highly dimensional 'omics' data and to select relevant information. Using the results of these methods, we propose to revisit few graphical outputs to better understand the relationships between two 'omics' data and to better visualise the correlation structure between the different biological entities. These graphical outputs include Correlation Circle plots, Relevance Networks and Clustered Image Maps. We demonstrate the usefulness of such graphical outputs on several biological data sets and further assess their biological relevance using gene ontology analysis. CONCLUSIONS Such graphical outputs are undoubtedly useful to aid the interpretation of these promising integrative analysis tools and will certainly help in addressing fundamental biological questions and understanding systems as a whole. AVAILABILITY The graphical tools described in this paper are implemented in the freely available R package mixOmics and in its associated web application.
Collapse
Affiliation(s)
- Ignacio González
- , Institut de Mathématiques - Université de Toulouse III et CNRS, UMR 5219, F-31062 Toulouse, France
| | - Kim-Anh Lê Cao
- Queensland Facility for Advanced Bioinformatics and the Institute for Molecular Bioscience, The University of Queensland, 4072 St Lucia, QLD, Australia
| | - Melissa J Davis
- Queensland Facility for Advanced Bioinformatics and the Institute for Molecular Bioscience, The University of Queensland, 4072 St Lucia, QLD, Australia
| | - Sébastien Déjean
- , Institut de Mathématiques - Université de Toulouse III et CNRS, UMR 5219, F-31062 Toulouse, France
| |
Collapse
|
16
|
Ulrich T. Pareto-Set Analysis: Biobjective Clustering in Decision and Objective Spaces. JOURNAL OF MULTI-CRITERIA DECISION ANALYSIS 2012. [DOI: 10.1002/mcda.1477] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Tamara Ulrich
- Computer Engineering and Networks Laboratory; ETH Zurich; Zurich Switzerland
| |
Collapse
|
17
|
Inkielewicz-Stępniak I, Knap N. Effect of exposure to fluoride and acetaminophen on oxidative/nitrosative status of liver and kidney in male and female rats. Pharmacol Rep 2012; 64:902-11. [DOI: 10.1016/s1734-1140(12)70885-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2011] [Revised: 04/16/2012] [Indexed: 10/25/2022]
|
18
|
Yao F, Coquery J, Lê Cao KA. Independent Principal Component Analysis for biologically meaningful dimension reduction of large biological data sets. BMC Bioinformatics 2012; 13:24. [PMID: 22305354 PMCID: PMC3298499 DOI: 10.1186/1471-2105-13-24] [Citation(s) in RCA: 83] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2011] [Accepted: 02/03/2012] [Indexed: 11/13/2022] Open
Abstract
Background A key question when analyzing high throughput data is whether the information provided by the measured biological entities (gene, metabolite expression for example) is related to the experimental conditions, or, rather, to some interfering signals, such as experimental bias or artefacts. Visualization tools are therefore useful to better understand the underlying structure of the data in a 'blind' (unsupervised) way. A well-established technique to do so is Principal Component Analysis (PCA). PCA is particularly powerful if the biological question is related to the highest variance. Independent Component Analysis (ICA) has been proposed as an alternative to PCA as it optimizes an independence condition to give more meaningful components. However, neither PCA nor ICA can overcome both the high dimensionality and noisy characteristics of biological data. Results We propose Independent Principal Component Analysis (IPCA) that combines the advantages of both PCA and ICA. It uses ICA as a denoising process of the loading vectors produced by PCA to better highlight the important biological entities and reveal insightful patterns in the data. The result is a better clustering of the biological samples on graphical representations. In addition, a sparse version is proposed that performs an internal variable selection to identify biologically relevant features (sIPCA). Conclusions On simulation studies and real data sets, we showed that IPCA offers a better visualization of the data than ICA and with a smaller number of components than PCA. Furthermore, a preliminary investigation of the list of genes selected with sIPCA demonstrate that the approach is well able to highlight relevant genes in the data with respect to the biological experiment. IPCA and sIPCA are both implemented in the R package mixomics dedicated to the analysis and exploration of high dimensional biological data sets, and on mixomics' web-interface.
Collapse
Affiliation(s)
- Fangzhou Yao
- Queensland Facility for Advanced Bioinformatics, University of Queensland, St Lucia, Australia
| | | | | |
Collapse
|
19
|
Dymacek J, Guo NL. Systems Approach to Identifying Relevant Pathways from Phenotype Information in Dose-Dependent Time Series Microarray Data. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE 2011; 2011:290-293. [PMID: 25984395 PMCID: PMC4429298 DOI: 10.1109/bibm.2011.76] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
This study presents a novel computational approach to find relevant pathways from dose-dependent time series gene expression data which are significantly associated with a phenotype pattern pathological patterns in the comprehensive evaluation of database of pathways. Our system uses four steps: 1) identify a set of genes which change significantly in dose or time; 2) find phenotype patterns and gene coefficients for the genes found in step 1; 3) expand to genome-wide coefficients, and 4) identify pathways which are significantly relevant to a phenotype pattern. Our technique finds biologically relevant pathways with and without phenotype-constraints. Our system has been used on genome-wide expression profiles of mouse lungs (n=160) following aspiration of well dispersed multi-walled carbon nanotubes (MWCNT), in order to detect MWCNT-induced lung inflammation and related pathways. The identified significant pathways are supported by evidence in the literature and biological validation.
Collapse
Affiliation(s)
- Julian Dymacek
- Mary Babb Randolph Cancer Center, West Virginia University, Morgantown, WV 26506, USA
| | - Nancy Lan Guo
- Mary Babb Randolph Cancer Center, West Virginia University, Morgantown, WV 26506, USA
| |
Collapse
|
20
|
Baralis E, Bruno G, Fiori A. Measuring gene similarity by means of the classification distance. Knowl Inf Syst 2011. [DOI: 10.1007/s10115-010-0374-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
21
|
Afshari CA, Hamadeh HK, Bushel PR. The evolution of bioinformatics in toxicology: advancing toxicogenomics. Toxicol Sci 2010; 120 Suppl 1:S225-37. [PMID: 21177775 DOI: 10.1093/toxsci/kfq373] [Citation(s) in RCA: 110] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
As one reflects back through the past 50 years of scientific research, a significant accomplishment was the advance into the genomic era. Basic research scientists have uncovered the genetic code and the foundation of the most fundamental building blocks for the molecular activity that supports biological structure and function. Accompanying these structural and functional discoveries is the advance of techniques and technologies to probe molecular events, in time, across environmental and chemical exposures, within individuals, and across species. The field of toxicology has kept pace with advances in molecular study, and the past 50 years recognizes significant growth and explosive understanding of the impact of the compounds and environment to basic cellular and molecular machinery. The advancement of molecular techniques applied in a whole-genomic capacity to the study of toxicant effects, toxicogenomics, is no doubt a significant milestone for toxicological research. Toxicogenomics has also provided an avenue for advancing a joining of multidisciplinary sciences including engineering and informatics in traditional toxicological research. This review will cover the evolution of the field of toxicogenomics in the context of informatics integration its current promise, and limitations.
Collapse
Affiliation(s)
- Cynthia A Afshari
- Department of Comparative Biology and Safety Sciences, Amgen Inc., Thousand Oaks, California 91320, USA.
| | | | | |
Collapse
|
22
|
Manamperi A. Current developments in genomics and personalized health care: impact on public health. Asia Pac J Public Health 2009; 20:242-50. [PMID: 19124318 DOI: 10.1177/1010539508316783] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
The knowledge gained from the characterization of genomes, especially the human genome, holds considerable potential for the development of new health care innovations for prevention, diagnosis, and management of many diseases in the coming decade. However, owing to the presence of highly complex scientific, economic, social, and ethical issues associated with this field, societies will need to be better prepared for the era of postgenomics and its consequences. It is important to ensure that the benefits of genomics are distributed fairly among all the countries of the world and that the well-tried and more conventional approaches to medical research and practice are not neglected while the medical potential of genomics is being explored. In this report, the author focuses mainly on human genomics, its applications, development of related technologies and issues related to the dissemination of knowledge derived from genome information, and finally, their impact on global health care.
Collapse
Affiliation(s)
- Aresha Manamperi
- Molecular Medicine Unit, Faculty of Medicine, University of Kelaniya, Sri Lanka.
| |
Collapse
|
23
|
Nigsch F, Macaluso NJM, Mitchell JBO, Zmuidinavicius D. Computational toxicology: an overview of the sources of data and of modelling methods. Expert Opin Drug Metab Toxicol 2009; 5:1-14. [PMID: 19236225 DOI: 10.1517/17425250802660467] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
BACKGROUND Toxicology has the goal of ensuring the safety of humans, animals and the environment. Computational toxicology is an area of active development and great potential. There are tangible reasons for the emerging interest in this discipline from academia, industry, regulatory bodies and governments. RESULTS Pharmaceuticals, personal health care products, nutritional ingredients and products of the chemical industries are all potential hazards and need to be assessed. Toxicological tests for these products are costly, frequently use laboratory animals and are time-consuming. This delays end-user access to improved products or, conversely, the timely withdrawal of dangerous substances from the market. The aim of computational toxicology is to accelerate the assessment of potentially dangerous substances through in silico models. CONCLUSIONS In this review, we provide an overview of the development of models for computational toxicology. Addressing the significant divide between the experimental and computational worlds-believed to be a prime hindrance to computational toxicology-we briefly consider the fundamental issue of toxicological data and the assays they stem from. Different kinds of models that can be built using such data are presented: computational filters, models for specific toxicological endpoints and tools for the generation of testable hypotheses.
Collapse
Affiliation(s)
- Florian Nigsch
- Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK.
| | | | | | | |
Collapse
|
24
|
Bushel PR. Clustering of gene expression data and end-point measurements by simulated annealing. J Bioinform Comput Biol 2009; 7:193-215. [PMID: 19226667 DOI: 10.1142/s021972000900400x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2008] [Revised: 09/23/2008] [Accepted: 11/15/2008] [Indexed: 11/18/2022]
Abstract
Most clustering techniques do not incorporate phenotypic data. Limited biological interpretation is garnered from the informal process of clustering biological samples and then labeling groups with the phenotypes of the samples. A more formal approach of clustering samples is presented. The method utilizes simulated annealing of the Modk-prototypes objective function. Separate weighting terms are used for microarray, clinical chemistry, and histopathology measurements to control the influence of each data domain on the clustering of the samples. The weights are adapted during the clustering process. A cluster's prototype is representative of the phenotype of the cluster members. Genes are extracted from phenotypic prototypes obtained from the livers of rats exposed to acetaminophen (an analgesic and antipyretic agent) that differed in the extent of centrilobular necrosis. Map kinase signaling and linoleic acid metabolism were significant biological processes influenced by the exposures of acetaminophen that manifested centrilobular necrosis.
Collapse
Affiliation(s)
- Pierre R Bushel
- Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA.
| |
Collapse
|
25
|
Kiyosawa N, Ando Y, Manabe S, Yamoto T. Toxicogenomic biomarkers for liver toxicity. J Toxicol Pathol 2009; 22:35-52. [PMID: 22271975 PMCID: PMC3246017 DOI: 10.1293/tox.22.35] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2008] [Accepted: 11/26/2008] [Indexed: 12/15/2022] Open
Abstract
Toxicogenomics (TGx) is a widely used technique in the preclinical stage of drug development to investigate the molecular mechanisms of toxicity. A number of candidate TGx biomarkers have now been identified and are utilized for both assessing and predicting toxicities. Further accumulation of novel TGx biomarkers will lead to more efficient, appropriate and cost effective drug risk assessment, reinforcing the paradigm of the conventional toxicology system with a more profound understanding of the molecular mechanisms of drug-induced toxicity. In this paper, we overview some practical strategies as well as obstacles for identifying and utilizing TGx biomarkers based on microarray analysis. Since clinical hepatotoxicity is one of the major causes of drug development attrition, the liver has been the best documented target organ for TGx studies to date, and we therefore focused on information from liver TGx studies. In this review, we summarize the current resources in the literature in regard to TGx studies of the liver, from which toxicologists could extract potential TGx biomarker gene sets for better hepatotoxicity risk assessment.
Collapse
Affiliation(s)
- Naoki Kiyosawa
- Medicinal Safety Research Labs., Daiichi Sankyo Co., Ltd., 717 Horikoshi, Fukuroi, Shizuoka 437-0065, Japan
| | | | | | | |
Collapse
|
26
|
Fostel JM. Towards standards for data exchange and integration and their impact on a public database such as CEBS (Chemical Effects in Biological Systems). Toxicol Appl Pharmacol 2008; 233:54-62. [PMID: 18680759 DOI: 10.1016/j.taap.2008.06.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Integration, re-use and meta-analysis of high content study data, typical of DNA microarray studies, can increase its scientific utility. Access to study data and design parameters would enhance the mining of data integrated across studies. However, without standards for which data to include in exchange, and common exchange formats, publication of high content data is time-consuming and often prohibitive. The MGED Society (www.mged.org) was formed in response to the widespread publication of microarray data, and the recognition of the utility of data re-use for meta-analysis. The NIEHS has developed the Chemical Effects in Biological Systems (CEBS) database, which can manage and integrate study data and design from biological and biomedical studies. As community standards are developed for study data and metadata it will become increasingly straightforward to publish high content data in CEBS, where they will be available for meta-analysis. Different exchange formats for study data are being developed: Standard for Exchange of Nonclinical Data (SEND; www.cdisc.org); Tox-ML (www.Leadscope.com) and Simple Investigation Formatted Text (SIFT) from the NIEHS. Data integration can be done at the level of conclusions about responsive genes and phenotypes, and this workflow is supported by CEBS. CEBS also integrates raw and preprocessed data within a given platform. The utility and a method for integrating data within and across DNA microarray studies is shown in an example analysis using DrugMatrix data deposited in CEBS by Iconix Pharmaceuticals.
Collapse
Affiliation(s)
- Jennifer M Fostel
- Global Health Sector, SRA International, Inc., LLC, Durham, North Carolina, USA.
| |
Collapse
|
27
|
Huang L, Heinloth AN, Zeng ZB, Paules RS, Bushel PR. Genes related to apoptosis predict necrosis of the liver as a phenotype observed in rats exposed to a compendium of hepatotoxicants. BMC Genomics 2008; 9:288. [PMID: 18558008 PMCID: PMC2478688 DOI: 10.1186/1471-2164-9-288] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2008] [Accepted: 06/16/2008] [Indexed: 01/20/2023] Open
Abstract
Background Some of the biochemical events that lead to necrosis of the liver are well-known. However, the pathogenesis of necrosis of the liver from exposure to hepatotoxicants is a complex biological response to the injury. We hypothesize that gene expression profiles can serve as a signature to predict the level of necrosis elicited by acute exposure of rats to a variety of hepatotoxicants and postulate that the expression profiles of the predictor genes in the signature can provide insight to some of the biological processes and molecular pathways that may be involved in the manifestation of necrosis of the rat liver. Results Rats were treated individually with one of seven known hepatotoxicants and were analyzed for gene expression by microarray. Liver samples were grouped by the level of necrosis exhibited in the tissue. Analysis of significantly differentially expressed genes between adjacent necrosis levels revealed that inflammation follows programmed cell death in response to the agents. Using a Random Forest classifier with feature selection, 21 informative genes were identified which achieved 90%, 80% and 60% prediction accuracies of necrosis against independent test data derived from the livers of rats exposed to acetaminophen, carbon tetrachloride, and allyl alcohol, respectively. Pathway and gene network analyses of the genes in the signature revealed several gene interactions suggestive of apoptosis as a process possibly involved in the manifestation of necrosis of the liver from exposure to the hepatotoxicants. Cytotoxic effects of TNF-α, as well as transcriptional regulation by JUN and TP53, and apoptosis-related genes possibly lead to necrosis. Conclusion The data analysis, gene selection and prediction approaches permitted grouping of the classes of rat liver samples exhibiting necrosis to improve the accuracy of predicting the level of necrosis as a phenotypic end-point observed from the exposure. The strategy, along with pathway analysis and gene network reconstruction, led to the identification of 1) expression profiles of genes as a signature of necrosis and 2) perturbed regulatory processes that exhibited biological relevance to the manifestation of necrosis from exposure of rat livers to the compendium of hepatotoxicants.
Collapse
Affiliation(s)
- Lingkang Huang
- Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina, USA.
| | | | | | | | | |
Collapse
|