1
|
Lee YJ, Kim WJ, Lee SH, Kim JH, Kwon SJ, Ahn JW, Kim SH, Kim JB, Lyu JI, Bae CH, Ryu J. Weighted gene co-expression network analysis - based selection of hub genes related to phenolic and volatile compounds and seed coat color in sorghum. BMC PLANT BIOLOGY 2025; 25:682. [PMID: 40410657 PMCID: PMC12100830 DOI: 10.1186/s12870-025-06657-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Accepted: 04/30/2025] [Indexed: 05/25/2025]
Abstract
BACKGROUND Sorghum grains are rich in phenolic compounds, which are noted for their anticancer, antioxidant, and anti-inflammatory properties, as well as volatile compounds (VOCs) that contribute to aroma and fermentation processes. There is a known close relationship between sorghum coat color and phenolic compound content (PCC), particularly flavonoids which are pigments that confer red and purple colors in flowers and seeds. RESULTS Our results showed that black seeds had the highest total tannin content (TTC) and ketone content, which were measured at 457.7 mg CE g-1 and 96 g 100 g-1, respectively, which were 4.87 and 1.35 - fold higher than those of white seeds. L* showed a negative correlation between TTC (r = -0.770, P < 0.01) and ketone (r = -0.814, P < 0.01), while TFC and a* showed a strong positive correlation (r = 0.829, P < 0.001). RNA sequencing analysis identified 1,422 up-regulated and 1,586 down-regulated differentially expressed genes. Weighted gene co-expression analysis highlighted two color-related gene modules: the magenta 2 module associated with TTC, TPC, VOCs and L* value, and the blue module associated with TFC, and a* values. Hub genes identified within these modules included ABCB28 in the magenta 2 module, and PTCD1 and ANK in the blue module. CONCLUSIONS We confirmed the relationship between PCC, VOCs, and seed coat color, with darker seed coat colors showing higher tannin, ketone contents and redder colors indicating higher flavonoid content. Network analysis helped pinpoint key genes involved in these traits. This study will provide essential data for improving the food and industrial use of sorghum.
Collapse
Affiliation(s)
- Ye-Jin Lee
- Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, Jeongeup, 56212, Republic of Korea
- Deparment of Plant Production Sciences, Graduate School, Sunchon National University, Suncheon, 57922, Republic of Korea
| | - Woon Ji Kim
- Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, Jeongeup, 56212, Republic of Korea
| | - Seung Hyeon Lee
- Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, Jeongeup, 56212, Republic of Korea
- Department of Integrative Food, Bioscience and Biotechnology, Chonnam National University, Gwangju, 61186, Republic of Korea
| | - Jae Hoon Kim
- Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, Jeongeup, 56212, Republic of Korea
| | - Soon-Jae Kwon
- Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, Jeongeup, 56212, Republic of Korea
| | - Joon-Woo Ahn
- Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, Jeongeup, 56212, Republic of Korea
| | - Sang Hoon Kim
- Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, Jeongeup, 56212, Republic of Korea
| | - Jin-Baek Kim
- Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, Jeongeup, 56212, Republic of Korea
| | - Jae Il Lyu
- Department of Agricultural Biotechnology, Rural Development Administration (RDA), National Institute of Agricultural Sciences, Jeonju, 54874, Republic of Korea
| | - Chang-Hyu Bae
- Deparment of Plant Production Sciences, Graduate School, Sunchon National University, Suncheon, 57922, Republic of Korea.
| | - Jaihyunk Ryu
- Advanced Radiation Technology Institute, Korea Atomic Energy Research Institute, Jeongeup, 56212, Republic of Korea.
| |
Collapse
|
2
|
Cheng X, Meng X, Chen R, Song Z, Li S, Wei S, Lv H, Zhang S, Tang H, Jiang Y, Zhang R. The molecular subtypes of autoimmune diseases. Comput Struct Biotechnol J 2024; 23:1348-1363. [PMID: 38596313 PMCID: PMC11001648 DOI: 10.1016/j.csbj.2024.03.026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 03/27/2024] [Accepted: 03/27/2024] [Indexed: 04/11/2024] Open
Abstract
Autoimmune diseases (ADs) are characterized by their complexity and a wide range of clinical differences. Despite patients presenting with similar symptoms and disease patterns, their reactions to treatments may vary. The current approach of personalized medicine, which relies on molecular data, is seen as an effective method to address the variability in these diseases. This review examined the pathologic classification of ADs, such as multiple sclerosis and lupus nephritis, over time. Acknowledging the limitations inherent in pathologic classification, the focus shifted to molecular classification to achieve a deeper insight into disease heterogeneity. The study outlined the established methods and findings from the molecular classification of ADs, categorizing systemic lupus erythematosus (SLE) into four subtypes, inflammatory bowel disease (IBD) into two, rheumatoid arthritis (RA) into three, and multiple sclerosis (MS) into a single subtype. It was observed that the high inflammation subtype of IBD, the RA inflammation subtype, and the MS "inflammation & EGF" subtype share similarities. These subtypes all display a consistent pattern of inflammation that is primarily driven by the activation of the JAK-STAT pathway, with the effective drugs being those that target this signaling pathway. Additionally, by identifying markers that are uniquely associated with the various subtypes within the same disease, the study was able to describe the differences between subtypes in detail. The findings are expected to contribute to the development of personalized treatment plans for patients and establish a strong basis for tailored approaches to treating autoimmune diseases.
Collapse
Affiliation(s)
| | | | | | - Zerun Song
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shuai Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Siyu Wei
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hongchao Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shuhao Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hao Tang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yongshuai Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Ruijie Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| |
Collapse
|
3
|
Hemandhar Kumar S, Tapken I, Kuhn D, Claus P, Jung K. bootGSEA: a bootstrap and rank aggregation pipeline for multi-study and multi-omics enrichment analyses. FRONTIERS IN BIOINFORMATICS 2024; 4:1380928. [PMID: 38633435 PMCID: PMC11021641 DOI: 10.3389/fbinf.2024.1380928] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 03/18/2024] [Indexed: 04/19/2024] Open
Abstract
Introduction: Gene set enrichment analysis (GSEA) subsequent to differential expression analysis is a standard step in transcriptomics and proteomics data analysis. Although many tools for this step are available, the results are often difficult to reproduce because set annotations can change in the databases, that is, new features can be added or existing features can be removed. Finally, such changes in set compositions can have an impact on biological interpretation. Methods: We present bootGSEA, a novel computational pipeline, to study the robustness of GSEA. By repeating GSEA based on bootstrap samples, the variability and robustness of results can be studied. In our pipeline, not all genes or proteins are involved in the different bootstrap replicates of the analyses. Finally, we aggregate the ranks from the bootstrap replicates to obtain a score per gene set that shows whether it gains or loses evidence compared to the ranking of the standard GSEA. Rank aggregation is also used to combine GSEA results from different omics levels or from multiple independent studies at the same omics level. Results: By applying our approach to six independent cancer transcriptomics datasets, we showed that bootstrap GSEA can aid in the selection of more robust enriched gene sets. Additionally, we applied our approach to paired transcriptomics and proteomics data obtained from a mouse model of spinal muscular atrophy (SMA), a neurodegenerative and neurodevelopmental disease associated with multi-system involvement. After obtaining a robust ranking at both omics levels, both ranking lists were combined to aggregate the findings from the transcriptomics and proteomics results. Furthermore, we constructed the new R-package "bootGSEA," which implements the proposed methods and provides graphical views of the findings. Bootstrap-based GSEA was able in the example datasets to identify gene or protein sets that were less robust when the set composition changed during bootstrap analysis. Discussion: The rank aggregation step was useful for combining bootstrap results and making them comparable to the original findings on the single-omics level or for combining findings from multiple different omics levels.
Collapse
Affiliation(s)
- Shamini Hemandhar Kumar
- Institute for Animal Genomics, University of Veterinary Medicine, Foundation, Hannover, Germany
- Center for Systems Neuroscience (ZSN), University of Veterinary Medicine, Foundation, Hannover, Germany
| | - Ines Tapken
- Center for Systems Neuroscience (ZSN), University of Veterinary Medicine, Foundation, Hannover, Germany
- SMATHERIA gGmbH—Non-Profit Biomedical Research Institute, Hannover, Germany
| | - Daniela Kuhn
- SMATHERIA gGmbH—Non-Profit Biomedical Research Institute, Hannover, Germany
- Clinic for Conservative Dentistry, Periodontology and Preventive Dentistry, Hannover Medical School, Hannover, Germany
| | - Peter Claus
- Center for Systems Neuroscience (ZSN), University of Veterinary Medicine, Foundation, Hannover, Germany
- SMATHERIA gGmbH—Non-Profit Biomedical Research Institute, Hannover, Germany
| | - Klaus Jung
- Institute for Animal Genomics, University of Veterinary Medicine, Foundation, Hannover, Germany
- Center for Systems Neuroscience (ZSN), University of Veterinary Medicine, Foundation, Hannover, Germany
| |
Collapse
|
4
|
Cai L, Huang X, Feng H, Fan G, Sun X. Antimicrobial mechanisms of g-C 3 N 4 @ZnO against oomycetes Phytophthora capsici: from its metabolism, membrane structures and growth. PEST MANAGEMENT SCIENCE 2024; 80:2096-2108. [PMID: 38135506 DOI: 10.1002/ps.7946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 11/25/2023] [Accepted: 12/23/2023] [Indexed: 12/24/2023]
Abstract
BACKGROUND Phytophthora capsici, a refractory and model oomycete plant pathogen, especially threatens multiple vegetable crops. A limited number of chemical pesticides play a vital role in controlling oomycete plant diseases. However, this approach often leads to excessive use of chemical agent, exacerbates environmental issues and more and more drug-resistant strains of oomycete. Therefore, it is imperative to devise innovative solutions that can effectively address the infection of oomycete while maintaining high levels of environmental sustainability and low toxicity. RESULTS In this study, g-C3 N4 @ZnO heterostructure was synthesized and characterized. The g-C3 N4 @ZnO showed higher toxicity on Phytophthora capsici than graphitic carbon nitride (g-C3 N4 ) nanosheets and zinc oxide (ZnO) nanoparticles in vitro and in vivo. Except the hyphal growth of Phytophthora capsici, their germination rate of spores, sporangium formation and number of spores were all suppressed by g-C3 N4 @ZnO heterostructure. Furthermore, we found that this g-C3 N4 @ZnO heterostructure has higher photocatalytic activity under visible light, which potentially enhanced the reactive oxygen species (ROS) mediated stress on Phytophthora capsici. Ultrastructural morphology, global changes of gene expression and weighted gene co-expression network analysis all supported that the anti-oomycete activity of g-C3 N4 @ZnO was manifested in the destruction of membrane system and inhibition of multiple metabolisms of Phytophthora capsici under visible irradiation, which also could be attributed to the ROS and zinc ion (Zn2+ ) mediated stress. CONCLUSION This works offers a novel oomycete disease management strategy by using g-C3 N4 @ZnO, which were attributed to the ROS stress, destruction of membrane system and inhibition of multiple metabolisms. © 2023 Society of Chemical Industry.
Collapse
Affiliation(s)
- Lin Cai
- Guizhou Key Laboratory for Tobacco Quality, College of Tobacco Science of Guizhou University, Guiyang, China
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals of Guizhou University, Guiyang, China
| | - Xunliang Huang
- National Key Laboratory of Green Pesticide, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Center for R&D of Fine Chemicals of Guizhou University, Guiyang, China
| | - Hui Feng
- Guizhou Key Laboratory for Tobacco Quality, College of Tobacco Science of Guizhou University, Guiyang, China
| | - Guangjin Fan
- College of Plant Protection, Southwest University, Chongqing, China
| | - Xianchao Sun
- College of Plant Protection, Southwest University, Chongqing, China
| |
Collapse
|
5
|
Ibrahim S, Ahmad N, Kuang L, Li K, Tian Z, Sadau SB, Tajo SM, Wang X, Wang H, Dun X. Transcriptome analysis reveals key regulatory genes for root growth related to potassium utilization efficiency in rapeseed ( Brassica napus L.). FRONTIERS IN PLANT SCIENCE 2023; 14:1194914. [PMID: 37546248 PMCID: PMC10400329 DOI: 10.3389/fpls.2023.1194914] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 07/03/2023] [Indexed: 08/08/2023]
Abstract
Root system architecture (RSA) is the primary predictor of nutrient intake and significantly influences potassium utilization efficiency (KUE). Uncertainty persists regarding the genetic factors governing root growth in rapeseed. The root transcriptome analysis reveals the genetic basis driving crop root growth. In this study, RNA-seq was used to profile the overall transcriptome in the root tissue of 20 Brassica napus accessions with high and low KUE. 71,437 genes in the roots displayed variable expression profiles between the two contrasting genotype groups. The 212 genes that had varied expression levels between the high and low KUE lines were found using a pairwise comparison approach. The Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) functional classification analysis revealed that the DEGs implicated in hormone and signaling pathways, as well as glucose, lipid, and amino acid metabolism, were all differently regulated in the rapeseed root system. Additionally, we discovered 33 transcription factors (TFs) that control root development were differentially expressed. By combining differential expression analysis, weighted gene co-expression network analysis (WGCNA), and recent genome-wide association study (GWAS) results, four candidate genes were identified as essential hub genes. These potential genes were located fewer than 100 kb from the peak SNPs of QTL clusters, and it was hypothesized that they regulated the formation of the root system. Three of the four hub genes' homologs-BnaC04G0560400ZS, BnaC04G0560400ZS, and BnaA03G0073500ZS-have been shown to control root development in earlier research. The information produced by our transcriptome profiling could be useful in revealing the molecular processes involved in the growth of rapeseed roots in response to KUE.
Collapse
Affiliation(s)
- Sani Ibrahim
- Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
- Department of Plant Biology, Faculty of Life Sciences, College of Natural and Pharmaceutical Sciences, Bayero University, Kano, Nigeria
| | - Nazir Ahmad
- Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
| | - Lieqiong Kuang
- Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
| | - Keqi Li
- Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
| | - Ze Tian
- Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
| | - Salisu Bello Sadau
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences (Institute of Cotton Research (ICR), CAAS), Anyang, China
| | - Sani Muhammad Tajo
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences (Institute of Cotton Research (ICR), CAAS), Anyang, China
| | - Xinfa Wang
- Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
| | - Hanzhong Wang
- Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
| | - Xiaoling Dun
- Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture and Rural Affairs, Oil Crops Research Institute of the Chinese Academy of Agricultural Sciences, Wuhan, China
| |
Collapse
|
6
|
Qu J, Runcie D, Cheng H. Mega-scale Bayesian regression methods for genome-wide prediction and association studies with thousands of traits. Genetics 2023; 223:6931802. [PMID: 36529897 PMCID: PMC9991502 DOI: 10.1093/genetics/iyac183] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 05/06/2022] [Accepted: 11/17/2022] [Indexed: 12/23/2022] Open
Abstract
Large-scale phenotype data are expected to increase the accuracy of genome-wide prediction and the power of genome-wide association analyses. However, genomic analyses of high-dimensional, highly correlated traits are challenging. We developed a method for implementing high-dimensional Bayesian multivariate regression to simultaneously analyze genetic variants underlying thousands of traits. As a demonstration, we implemented the BayesC prior in the R package MegaLMM. Applied to Genomic Prediction, MegaBayesC effectively integrated hyperspectral reflectance data from 620 hyperspectral wavelengths to improve the accuracy of genetic value prediction on grain yield in a wheat dataset. Applied to Genome-Wide Association Studies, we used simulations to show that MegaBayesC can accurately estimate the effect sizes of QTL across a range of genetic architectures and causes of correlations among traits. To apply MegaBayesC to a realistic scenario involving whole-genome marker data, we developed a 2-stage procedure involving a preliminary step of candidate marker selection prior to multivariate regression. We then used MegaBayesC to identify genetic associations with flowering time in Arabidopsis thaliana, leveraging expression data from 20,843 genes. MegaBayesC selected 15 single nucleotide polymorphisms as important for flowering time, with 13 located within 100 kb of known flowering-time related genes, a higher validation rate than achieved by a single-stage analysis using only the flowering time data itself. These results demonstrate that MegaBayesC can efficiently and effectively leverage high-dimensional phenotypes in genetic analyses.
Collapse
Affiliation(s)
- Jiayi Qu
- Department of Animal Science, University of California Davis, Davis, CA 95616, USA
| | - Daniel Runcie
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| | - Hao Cheng
- Department of Plant Sciences, University of California Davis, Davis, CA 95616, USA
| |
Collapse
|
7
|
Akdemir D, Somo M, Isidro-Sanchéz J. An Expectation-Maximization Algorithm for Combining a Sample of Partially Overlapping Covariance Matrices. AXIOMS 2023; 12:161. [PMID: 37284612 PMCID: PMC10243021 DOI: 10.3390/axioms12020161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The generation of unprecedented amounts of data brings new challenges in data management, but also an opportunity to accelerate the identification of processes of multiple science disciplines. One of these challenges is the harmonization of high-dimensional unbalanced and heterogeneous data. In this manuscript, we propose a statistical approach to combine incomplete and partially-overlapping pieces of covariance matrices that come from independent experiments. We assume that the data are a random sample of partial covariance matrices sampled from Wishart distributions and we derive an expectation-maximization algorithm for parameter estimation. We demonstrate the properties of our method by (i) using simulation studies and (ii) using empirical datasets. In general, being able to make inferences about the covariance of variables not observed in the same experiment is a valuable tool for data analysis since covariance estimation is an important step in many statistical applications, such as multivariate analysis, principal component analysis, factor analysis, and structural equation modeling.
Collapse
Affiliation(s)
- Deniz Akdemir
- Center of International Bone Marrow Transplantation Research, Minneapolis, MN 55401-1206, USA
| | | | - Julio Isidro-Sanchéz
- Centro de Biotecnologia y Genómica de Plantas, Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria, Universidad Politécnica de Madrid, 28223, Madrid, Spain
| |
Collapse
|
8
|
Defo J, Awany D, Ramesar R. From SNP to pathway-based GWAS meta-analysis: do current meta-analysis approaches resolve power and replication in genetic association studies? Brief Bioinform 2023; 24:6972298. [PMID: 36611240 DOI: 10.1093/bib/bbac600] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 11/30/2022] [Accepted: 12/06/2022] [Indexed: 01/09/2023] Open
Abstract
Genome-wide association studies (GWAS) have benefited greatly from enhanced high-throughput technology in recent decades. GWAS meta-analysis has become increasingly popular to highlight the genetic architecture of complex traits, informing about the replicability and variability of effect estimations across human ancestries. A wealth of GWAS meta-analysis methodologies have been developed depending on the input data and the outcome information of interest. We present a survey of current approaches from SNP to pathway-based meta-analysis by acknowledging the range of resources and methodologies in the field, and we provide a comprehensive review of different categories of Genome-Wide Meta-analysis methods employed. These methods highlight different levels at which GWAS meta-analysis may be done, including Single Nucleotide Polymorphisms, Genes and Pathways, for which we describe their framework outline. We also discuss the strengths and pitfalls of each approach and make suggestions regarding each of them.
Collapse
Affiliation(s)
- Joel Defo
- Division of Human Genetics, Department of Pathology, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, 7925, Observatory, South Africa.,South African Medical Research Council Genomic and Personalized Medicine Research Unit
| | - Denis Awany
- South African Tuberculosis Vaccine Initiative (SATVI), University of Cape Town, 7925, South Africa
| | - Raj Ramesar
- Division of Human Genetics, Department of Pathology, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, 7925, Observatory, South Africa.,South African Medical Research Council Genomic and Personalized Medicine Research Unit
| |
Collapse
|
9
|
Wang Y, Wang Y, Liu X, Zhou J, Deng H, Zhang G, Xiao Y, Tang W. WGCNA Analysis Identifies the Hub Genes Related to Heat Stress in Seedling of Rice (Oryza sativa L.). Genes (Basel) 2022; 13:genes13061020. [PMID: 35741784 PMCID: PMC9222641 DOI: 10.3390/genes13061020] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Revised: 05/30/2022] [Accepted: 06/01/2022] [Indexed: 02/01/2023] Open
Abstract
Frequent high temperature weather affects the growth and development of rice, resulting in the decline of seed–setting rate, deterioration of rice quality and reduction of yield. Although some high temperature tolerance genes have been cloned, there is still little success in solving the effects of high temperature stress in rice (Oryza sativa L.). Based on the transcriptional data of seven time points, the weighted correlation network analysis (WGCNA) method was used to construct a co–expression network of differentially expressed genes (DEGs) between the rice genotypes IR64 (tolerant to heat stress) and Koshihikari (susceptible to heat stress). There were four modules in both genotypes that were highly correlated with the time points after heat stress in the seedling. We further identified candidate hub genes through clustering and analysis of protein interaction network with known–core genes. The results showed that the ribosome and protein processing in the endoplasmic reticulum were the common pathways in response to heat stress between the two genotypes. The changes of starch and sucrose metabolism and the biosynthesis of secondary metabolites pathways are possible reasons for the sensitivity to heat stress for Koshihikari. Our findings provide an important reference for the understanding of high temperature response mechanisms and the cultivation of high temperature resistant materials.
Collapse
Affiliation(s)
- Yubo Wang
- College of Agronomy, Hunan Agricultural University, Changsha 410128, China; (Y.W.); (Y.W.); (X.L.); (J.Z.); (H.D.); (G.Z.)
| | - Yingfeng Wang
- College of Agronomy, Hunan Agricultural University, Changsha 410128, China; (Y.W.); (Y.W.); (X.L.); (J.Z.); (H.D.); (G.Z.)
| | - Xiong Liu
- College of Agronomy, Hunan Agricultural University, Changsha 410128, China; (Y.W.); (Y.W.); (X.L.); (J.Z.); (H.D.); (G.Z.)
| | - Jieqiang Zhou
- College of Agronomy, Hunan Agricultural University, Changsha 410128, China; (Y.W.); (Y.W.); (X.L.); (J.Z.); (H.D.); (G.Z.)
| | - Huabing Deng
- College of Agronomy, Hunan Agricultural University, Changsha 410128, China; (Y.W.); (Y.W.); (X.L.); (J.Z.); (H.D.); (G.Z.)
| | - Guilian Zhang
- College of Agronomy, Hunan Agricultural University, Changsha 410128, China; (Y.W.); (Y.W.); (X.L.); (J.Z.); (H.D.); (G.Z.)
| | - Yunhua Xiao
- College of Agronomy, Hunan Agricultural University, Changsha 410128, China; (Y.W.); (Y.W.); (X.L.); (J.Z.); (H.D.); (G.Z.)
- Correspondence: (Y.X.); (W.T.)
| | - Wenbang Tang
- College of Agronomy, Hunan Agricultural University, Changsha 410128, China; (Y.W.); (Y.W.); (X.L.); (J.Z.); (H.D.); (G.Z.)
- State Key Laboratory of Hybrid Rice, Hunan Hybrid Rice Research Center, Changsha 410125, China
- Correspondence: (Y.X.); (W.T.)
| |
Collapse
|
10
|
Mary-Huard T, Das S, Mukhopadhyay I, Robin S. Querying multiple sets of P-values through composed hypothesis testing. Bioinformatics 2021; 38:141-148. [PMID: 34478490 DOI: 10.1093/bioinformatics/btab592] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Revised: 07/16/2021] [Accepted: 07/27/2021] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Combining the results of different experiments to exhibit complex patterns or to improve statistical power is a typical aim of data integration. The starting point of the statistical analysis often comes as a set of P-values resulting from previous analyses, that need to be combined flexibly to explore complex hypotheses, while guaranteeing a low proportion of false discoveries. RESULTS We introduce the generic concept of composed hypothesis, which corresponds to an arbitrary complex combination of simple hypotheses. We rephrase the problem of testing a composed hypothesis as a classification task and show that finding items for which the composed null hypothesis is rejected boils down to fitting a mixture model and classifying the items according to their posterior probabilities. We show that inference can be efficiently performed and provide a thorough classification rule to control for type I error. The performance and the usefulness of the approach are illustrated in simulations and on two different applications. The method is scalable, does not require any parameter tuning, and provided valuable biological insight on the considered application cases. AVAILABILITY AND IMPLEMENTATION The QCH methodology is available in the qch package hosted on CRAN. Additionally, R codes to reproduce the Einkorn example are available on the personal webpage of the first author: https://www6.inrae.fr/mia-paris/Equipes/Membres/Tristan-Mary-Huard. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tristan Mary-Huard
- Mathématiques et informatique appliqués (MIA)-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Paris 75231, France.,Génétique Quantitative et Evolution (GQE)-Le Moulon, Universite Paris-Saclay, INRAE, CNRS, AgroParisTech, Gif-sur-Yvette 91190, France
| | - Sarmistha Das
- Human Genetics Unit, Indian Statistical Institute, Kolkata 700108, India
| | | | - Stéphane Robin
- Mathématiques et informatique appliqués (MIA)-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Paris 75231, France.,Centre d'Écologie et des Sciences de la Conservation (CESCO), MNHN, CNRS, Sorbonne Université, Paris 75005, France
| |
Collapse
|
11
|
Campbell NR, Rao A, Hunter MV, Sznurkowska MK, Briker L, Zhang M, Baron M, Heilmann S, Deforet M, Kenny C, Ferretti LP, Huang TH, Perlee S, Garg M, Nsengimana J, Saini M, Montal E, Tagore M, Newton-Bishop J, Middleton MR, Corrie P, Adams DJ, Rabbie R, Aceto N, Levesque MP, Cornell RA, Yanai I, Xavier JB, White RM. Cooperation between melanoma cell states promotes metastasis through heterotypic cluster formation. Dev Cell 2021; 56:2808-2825.e10. [PMID: 34529939 DOI: 10.1016/j.devcel.2021.08.018] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Revised: 07/07/2021] [Accepted: 08/20/2021] [Indexed: 02/08/2023]
Abstract
Melanomas can have multiple coexisting cell states, including proliferative (PRO) versus invasive (INV) subpopulations that represent a "go or grow" trade-off; however, how these populations interact is poorly understood. Using a combination of zebrafish modeling and analysis of patient samples, we show that INV and PRO cells form spatially structured heterotypic clusters and cooperate in the seeding of metastasis, maintaining cell state heterogeneity. INV cells adhere tightly to each other and form clusters with a rim of PRO cells. Intravital imaging demonstrated cooperation in which INV cells facilitate dissemination of less metastatic PRO cells. We identified the TFAP2 neural crest transcription factor as a master regulator of clustering and PRO/INV states. Isolation of clusters from patients with metastatic melanoma revealed a subset with heterotypic PRO-INV clusters. Our data suggest a framework for the co-existence of these two divergent cell populations, in which heterotypic clusters promote metastasis via cell-cell cooperation.
Collapse
Affiliation(s)
- Nathaniel R Campbell
- Weill Cornell/Rockefeller Memorial Sloan Kettering Tri-Institutional MD-PhD Program, New York, NY 10065, USA; Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA; Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Anjali Rao
- Institute for Computational Medicine, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Miranda V Hunter
- Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Magdalena K Sznurkowska
- Department of Biology, Institute of Molecular Health Sciences, Swiss Federal Institute of Technology (ETH) Zurich, 8093 Zurich, Switzerland
| | - Luzia Briker
- Department of Dermatology, University of Zürich Hospital, University of Zürich, Zurich, Switzerland
| | - Maomao Zhang
- Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Maayan Baron
- Institute for Computational Medicine, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Silja Heilmann
- Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Maxime Deforet
- Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Colin Kenny
- Department of Anatomy and Cell Biology, University of Iowa, Iowa City, IA 52242, USA
| | - Lorenza P Ferretti
- Department of Dermatology, University of Zürich Hospital, University of Zürich, Zurich, Switzerland; Department of Molecular Mechanisms of Disease, University of Zürich, Zurich, Switzerland
| | - Ting-Hsiang Huang
- Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Sarah Perlee
- Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Manik Garg
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridgeshire, UK
| | - Jérémie Nsengimana
- Leeds Institute of Medical Research at St. James's, University of Leeds School of Medicine, Leeds, UK
| | - Massimo Saini
- Department of Biology, Institute of Molecular Health Sciences, Swiss Federal Institute of Technology (ETH) Zurich, 8093 Zurich, Switzerland
| | - Emily Montal
- Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Mohita Tagore
- Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Julia Newton-Bishop
- Leeds Institute of Medical Research at St. James's, University of Leeds School of Medicine, Leeds, UK
| | - Mark R Middleton
- Oxford NIHR Biomedical Research Centre and Department of Oncology, University of Oxford, Oxford, UK
| | - Pippa Corrie
- Cambridge Cancer Centre, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
| | - David J Adams
- Experimental Cancer Genetics, the Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Roy Rabbie
- Cambridge Cancer Centre, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK; Experimental Cancer Genetics, the Wellcome Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Nicola Aceto
- Department of Biology, Institute of Molecular Health Sciences, Swiss Federal Institute of Technology (ETH) Zurich, 8093 Zurich, Switzerland
| | - Mitchell P Levesque
- Department of Dermatology, University of Zürich Hospital, University of Zürich, Zurich, Switzerland
| | - Robert A Cornell
- Department of Anatomy and Cell Biology, University of Iowa, Iowa City, IA 52242, USA
| | - Itai Yanai
- Institute for Computational Medicine, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Joao B Xavier
- Computational and Systems Biology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.
| | - Richard M White
- Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.
| |
Collapse
|
12
|
Baggiolini A, Callahan SJ, Montal E, Weiss JM, Trieu T, Tagore MM, Tischfield SE, Walsh RM, Suresh S, Fan Y, Campbell NR, Perlee SC, Saurat N, Hunter MV, Simon-Vermot T, Huang TH, Ma Y, Hollmann T, Tickoo SK, Taylor BS, Khurana E, Koche RP, Studer L, White RM. Developmental chromatin programs determine oncogenic competence in melanoma. Science 2021; 373:eabc1048. [PMID: 34516843 DOI: 10.1126/science.abc1048] [Citation(s) in RCA: 92] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
[Figure: see text].
Collapse
Affiliation(s)
- Arianna Baggiolini
- Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Scott J Callahan
- Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Gerstner Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Emily Montal
- Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Joshua M Weiss
- Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Weill Cornell/Rockefeller/Sloan-Kettering Tri-Institutional MD-PhD Program, New York, NY 10065, USA
| | - Tuan Trieu
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021, USA.,Department of Physiology and Biophysics, Weill Cornell Medicine, 1300 York Avenue, New York, NY 10065, USA.,Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA.,Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Mohita M Tagore
- Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Sam E Tischfield
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Ryan M Walsh
- Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Shruthy Suresh
- Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Yujie Fan
- Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY 10065, USA
| | - Nathaniel R Campbell
- Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Weill Cornell/Rockefeller/Sloan-Kettering Tri-Institutional MD-PhD Program, New York, NY 10065, USA
| | - Sarah C Perlee
- Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Gerstner Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Nathalie Saurat
- Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Miranda V Hunter
- Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Theresa Simon-Vermot
- Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Ting-Hsiang Huang
- Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Yilun Ma
- Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Weill Cornell/Rockefeller/Sloan-Kettering Tri-Institutional MD-PhD Program, New York, NY 10065, USA
| | - Travis Hollmann
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Satish K Tickoo
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Barry S Taylor
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Marie-Josée and Henry R. Kravis Center for Molecular Oncology, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Joan & Sanford I. Weill Medical College of Cornell University, Cornell University, New York, NY, USA
| | - Ekta Khurana
- Institute for Computational Biomedicine, Weill Cornell Medical College, New York, NY 10021, USA.,Department of Physiology and Biophysics, Weill Cornell Medicine, 1300 York Avenue, New York, NY 10065, USA.,Caryl and Israel Englander Institute for Precision Medicine, Weill Cornell Medicine, New York, NY, USA.,Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA
| | - Richard P Koche
- Center for Epigenetics Research, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Lorenz Studer
- Center for Stem Cell Biology and Developmental Biology Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Gerstner Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Richard M White
- Department of Cancer Biology and Genetics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Weill Cornell Graduate School of Medical Sciences, Cornell University, New York, NY 10065, USA
| |
Collapse
|
13
|
Yuan L, Sun T, Zhao J, Shen Z. A Novel Computational Framework to Predict Disease-Related Copy Number Variations by Integrating Multiple Data Sources. Front Genet 2021; 12:696956. [PMID: 34267783 PMCID: PMC8276077 DOI: 10.3389/fgene.2021.696956] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Accepted: 05/24/2021] [Indexed: 11/13/2022] Open
Abstract
Copy number variation (CNV) may contribute to the development of complex diseases. However, due to the complex mechanism of path association and the lack of sufficient samples, understanding the relationship between CNV and cancer remains a major challenge. The unprecedented abundance of CNV, gene, and disease label data provides us with an opportunity to design a new machine learning framework to predict potential disease-related CNVs. In this paper, we developed a novel machine learning approach, namely, IHI-BMLLR (Integrating Heterogeneous Information sources with Biweight Mid-correlation and L1-regularized Logistic Regression under stability selection), to predict the CNV-disease path associations by using a data set containing CNV, disease state labels, and gene data. CNVs, genes, and diseases are connected through edges and then constitute a biological association network. To construct a biological network, we first used a self-adaptive biweight mid-correlation (BM) formula to calculate correlation coefficients between CNVs and genes. Then, we used logistic regression with L1 penalty (LLR) function to detect genes related to disease. We added stability selection strategy, which can effectively reduce false positives, when using self-adaptive BM and LLR. Finally, a weighted path search algorithm was applied to find top D path associations and important CNVs. The experimental results on both simulation and prostate cancer data show that IHI-BMLLR is significantly better than two state-of-the-art CNV detection methods (i.e., CCRET and DPtest) under false-positive control. Furthermore, we applied IHI-BMLLR to prostate cancer data and found significant path associations. Three new cancer-related genes were discovered in the paths, and these genes need to be verified by biological research in the future.
Collapse
Affiliation(s)
- Lin Yuan
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Tao Sun
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Jing Zhao
- School of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China
| | - Zhen Shen
- School of Computer and Software, Nanyang Institute of Technology, Nanyang, China
| |
Collapse
|
14
|
Chang SM, Yang M, Lu W, Huang YJ, Huang Y, Hung H, Miecznikowski JC, Lu TP, Tzeng JY. Gene-Set Integrative Analysis of Multi-Omics Data Using Tensor-based Association Test. Bioinformatics 2021; 37:2259-2265. [PMID: 33674827 PMCID: PMC8388036 DOI: 10.1093/bioinformatics/btab125] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 12/30/2020] [Accepted: 02/24/2021] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Facilitated by technological advances and the decrease in costs, it is feasible to gather subject data from several omics platforms. Each platform assesses different molecular events, and the challenge lies in efficiently analyzing these data to discover novel disease genes or mechanisms. A common strategy is to regress the outcomes on all omics variables in a gene set. However, this approach suffers from problems associated with high-dimensional inference. RESULTS We introduce a tensor-based framework for variable-wise inference in multi-omics analysis. By accounting for the matrix structure of an individual's multi-omics data, the proposed tensor methods incorporate the relationship among omics effects, reduce the number of parameters, and boost the modeling efficiency. We derive the variable-specific tensor test and enhance computational efficiency of tensor modeling. Using simulations and data applications on the Cancer Cell Line Encyclopedia (CCLE), we demonstrate our method performs favorably over baseline methods and will be useful for gaining biological insights in multi-omics analysis. AVAILABILITY AND IMPLEMENTATION R function and instruction are available from the authors' website: https://www4.stat.ncsu.edu/∼jytzeng/Software/TR.omics/TRinstruction.pdf. SUPPLEMENTARY INFORMATION Supplementary materials are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sheng-Mao Chang
- Department of Statistics, National Cheng Kung University, Tainan, Taiwan
| | - Meng Yang
- Department of Statistics, North Carolina State University, Raleigh NC, 27695, USA
| | - Wenbin Lu
- Department of Statistics, North Carolina State University, Raleigh NC, 27695, USA
| | - Yu-Jyun Huang
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Yueyang Huang
- Bioinformatics Research Center, North Carolina State University, Raleigh NC, 27695, USA
| | - Hung Hung
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | | | - Tzu-Pin Lu
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| | - Jung-Ying Tzeng
- Department of Statistics, National Cheng Kung University, Tainan, Taiwan.,Department of Statistics, North Carolina State University, Raleigh NC, 27695, USA.,Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan.,Bioinformatics Research Center, North Carolina State University, Raleigh NC, 27695, USA
| |
Collapse
|
15
|
Wang YXR, Li L, Li JJ, Huang H. Network Modeling in Biology: Statistical Methods for Gene and Brain Networks. Stat Sci 2021; 36:89-108. [PMID: 34305304 PMCID: PMC8296984 DOI: 10.1214/20-sts792] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
The rise of network data in many different domains has offered researchers new insight into the problem of modeling complex systems and propelled the development of numerous innovative statistical methodologies and computational tools. In this paper, we primarily focus on two types of biological networks, gene networks and brain networks, where statistical network modeling has found both fruitful and challenging applications. Unlike other network examples such as social networks where network edges can be directly observed, both gene and brain networks require careful estimation of edges using covariates as a first step. We provide a discussion on existing statistical and computational methods for edge esitimation and subsequent statistical inference problems in these two types of biological networks.
Collapse
Affiliation(s)
- Y X Rachel Wang
- School of Mathematics and Statistics, University of Sydney, Australia
| | - Lexin Li
- Department of Biostatistics and Epidemiology, School of Public Health, University of California, Berkeley
| | | | - Haiyan Huang
- Department of Statistics, University of California, Berkeley
| |
Collapse
|
16
|
Lin DY, Zeng D, Couper D. A general framework for integrative analysis of incomplete multiomics data. Genet Epidemiol 2020; 44:646-664. [PMID: 32691502 PMCID: PMC7951090 DOI: 10.1002/gepi.22328] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Revised: 03/05/2020] [Accepted: 05/29/2020] [Indexed: 12/21/2022]
Abstract
There is a tremendous current interest in measuring multiple types of omics features (e.g., DNA sequences, RNA expressions, methylation profiles, metabolic profiles, protein expressions) on a large number of subjects. Although genotypes are typically available for all study subjects, other data types may be measured only on a subset of subjects due to cost or other constraints. In addition, quantitative omics measurements, such as metabolite levels and protein expressions, are subject to detection limits in that the measurements below (or above) certain thresholds are not detectable. In this article, we propose a rigorous and powerful approach to handle missing values and detection limits in integrative analysis of multiomics data. We relate quantitative omics variables to genetic variants and other variables through linear regression models and relate phenotypes to quantitative omics variables and other variables through generalized linear models. We derive the joint-likelihood for the two sets of models by allowing arbitrary patterns of missing values and detection limits for quantitative omics variables. We carry out maximum-likelihood estimation through computationally fast and stable algorithms. The resulting estimators are approximately unbiased and statistically efficient. An application to a major study on chronic obstructive lung disease yielded new biological insights.
Collapse
Affiliation(s)
- Dan-Yu Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina
| | - David Couper
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina
| |
Collapse
|
17
|
Sun J, Du P, Miao H, Liang H. Robust feature screening procedures for single and mixed types of data. J STAT COMPUT SIM 2020. [DOI: 10.1080/00949655.2020.1719104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Jinhui Sun
- Ping An Technology Co., Ltd., Beijing, People's Republic of China
| | - Pang Du
- Department of Statistics, Virginia Tech, Blacksburg, VA, USA
| | - Hongyu Miao
- Department of Biostatistics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hua Liang
- Department of Statistics, George Washington University, Washington, DC, USA
| |
Collapse
|
18
|
Temprine K, Campbell NR, Huang R, Langdon EM, Simon-Vermot T, Mehta K, Clapp A, Chipman M, White RM. Regulation of the error-prone DNA polymerase Polκ by oncogenic signaling and its contribution to drug resistance. Sci Signal 2020; 13:13/629/eaau1453. [PMID: 32345725 DOI: 10.1126/scisignal.aau1453] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The DNA polymerase Polκ plays a key role in translesion synthesis, an error-prone replication mechanism. Polκ is overexpressed in various tumor types. Here, we found that melanoma and lung and breast cancer cells experiencing stress from oncogene inhibition up-regulated the expression of Polκ and shifted its localization from the cytoplasm to the nucleus. This effect was phenocopied by inhibition of the kinase mTOR, by induction of ER stress, or by glucose deprivation. In unstressed cells, Polκ is continually transported out of the nucleus by exportin-1. Inhibiting exportin-1 or overexpressing Polκ increased the abundance of nuclear-localized Polκ, particularly in response to the BRAFV600E-targeted inhibitor vemurafenib, which decreased the cytotoxicity of the drug in BRAFV600E melanoma cells. These observations were analogous to how Escherichia coli encountering cell stress and nutrient deprivation can up-regulate and activate DinB/pol IV, the bacterial ortholog of Polκ, to induce mutagenesis that enables stress tolerance or escape. However, we found that the increased expression of Polκ was not excessively mutagenic, indicating that noncatalytic or other functions of Polκ could mediate its role in stress responses in mammalian cells. Repressing the expression or nuclear localization of Polκ might prevent drug resistance in some cancer cells.
Collapse
Affiliation(s)
- Kelsey Temprine
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Gerstner Sloan Kettering Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Nathaniel R Campbell
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Tri-Institutional M.D./Ph.D. Program, Weill Cornell Medical College, New York, NY 10065, USA
| | - Richard Huang
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Erin M Langdon
- University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Theresa Simon-Vermot
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Krisha Mehta
- Division of General Internal Medicine, Department of Medicine, Weill Cornell Medical College, New York, NY 10065, USA
| | | | - Mollie Chipman
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.,Gerstner Sloan Kettering Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
| | - Richard M White
- Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA.
| |
Collapse
|
19
|
Duggal P, Ladd-Acosta C, Ray D, Beaty TH. The Evolving Field of Genetic Epidemiology: From Familial Aggregation to Genomic Sequencing. Am J Epidemiol 2019; 188:2069-2077. [PMID: 31509181 PMCID: PMC7036654 DOI: 10.1093/aje/kwz193] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 08/15/2019] [Accepted: 08/19/2019] [Indexed: 12/21/2022] Open
Abstract
The field of genetic epidemiology is relatively young and brings together genetics, epidemiology, and biostatistics to identify and implement the best study designs and statistical analyses for identifying genes controlling risk for complex and heterogeneous diseases (i.e., those where genes and environmental risk factors both contribute to etiology). The field has moved quickly over the past 40 years partly because the technology of genotyping and sequencing has forced it to adapt while adhering to the fundamental principles of genetics. In the last two decades, the available tools for genetic epidemiology have expanded from a genetic focus (considering 1 gene at a time) to a genomic focus (considering the entire genome), and now they must further expand to integrate information from other “-omics” (e.g., epigenomics, transcriptomics as measured by RNA expression) at both the individual and the population levels. Additionally, we can now also evaluate gene and environment interactions across populations to better understand exposure and the heterogeneity in disease risk. The future challenges facing genetic epidemiology are considerable both in scale and techniques, but the importance of the field will not diminish because by design it ties scientific goals with public health applications.
Collapse
Affiliation(s)
- Priya Duggal
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
- Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| | - Christine Ladd-Acosta
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| | - Debashree Ray
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| | - Terri H Beaty
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
| |
Collapse
|
20
|
Shi Q, Hu B, Zeng T, Zhang C. Multi-view Subspace Clustering Analysis for Aggregating Multiple Heterogeneous Omics Data. Front Genet 2019; 10:744. [PMID: 31497031 PMCID: PMC6712585 DOI: 10.3389/fgene.2019.00744] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 07/16/2019] [Indexed: 12/18/2022] Open
Abstract
Integration of distinct biological data types could provide a comprehensive view of biological processes or complex diseases. The combinations of molecules responsible for different phenotypes form multiple embedded (expression) subspaces, thus identifying the intrinsic data structure is challenging by regular integration methods. In this paper, we propose a novel framework of “Multi-view Subspace Clustering Analysis (MSCA),” which could measure the local similarities of samples in the same subspace and obtain the global consensus sample patterns (structures) for multiple data types, thereby comprehensively capturing the underlying heterogeneity of samples. Applied to various synthetic datasets, MSCA performs effectively to recognize the predefined sample patterns, and is robust to data noises. Given a real biological dataset, i.e., Cancer Cell Line Encyclopedia (CCLE) data, MSCA successfully identifies cell clusters of common aberrations across cancer types. A remarkable superiority over the state-of-the-art methods, such as iClusterPlus, SNF, and ANF, has also been demonstrated in our simulation and case studies.
Collapse
Affiliation(s)
- Qianqian Shi
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, China
| | - Bing Hu
- Department of Applied Mathematics, College of Science, Zhejiang University of Technology, Hangzhou, China
| | - Tao Zeng
- Key Laboratory of Systems Biology, Institute of Biochemistry and Cell Biology, Shanghai Institute of Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Shanghai, China
| | | |
Collapse
|
21
|
Rojo C, Zhang Q, Keleş S. iFunMed: Integrative functional mediation analysis of GWAS and eQTL studies. Genet Epidemiol 2019; 43:742-760. [PMID: 31328826 DOI: 10.1002/gepi.22217] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Revised: 04/17/2019] [Accepted: 05/07/2019] [Indexed: 11/08/2022]
Abstract
Genome-wide association studies (GWAS) have successfully identified thousands of genetic variants contributing to disease and other phenotypes. However, significant obstacles hamper our ability to elucidate causal variants, identify genes affected by causal variants, and characterize the mechanisms by which genotypes influence phenotypes. The increasing availability of genome-wide functional annotation data is providing unique opportunities to incorporate prior information into the analysis of GWAS to better understand the impact of variants on disease etiology. Although there have been many advances in incorporating prior information into prioritization of trait-associated variants in GWAS, functional annotation data have played a secondary role in the joint analysis of GWAS and molecular (i.e., expression) quantitative trait loci (eQTL) data in assessing evidence for association. To address this, we develop a novel mediation framework, iFunMed, to integrate GWAS and eQTL data with the utilization of publicly available functional annotation data. iFunMed extends the scope of standard mediation analysis by incorporating information from multiple genetic variants at a time and leveraging variant-level summary statistics. Data-driven computational experiments convey how informative annotations improve single-nucleotide polymorphism (SNP) selection performance while emphasizing robustness of iFunMed to noninformative annotations. Application to Framingham Heart Study data indicates that iFunMed is able to boost detection of SNPs with mediation effects that can be attributed to regulatory mechanisms.
Collapse
Affiliation(s)
- Constanza Rojo
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin
| | - Qi Zhang
- Department of Statistics, University of Nebraska-Lincoln, Lincoln, Nebraska
| | - Sündüz Keleş
- Department of Statistics, University of Wisconsin-Madison, Madison, Wisconsin.,Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, Wisconsin
| |
Collapse
|
22
|
Chu SH, Huang M, Kelly RS, Benedetti E, Siddiqui JK, Zeleznik OA, Pereira A, Herrington D, Wheelock CE, Krumsiek J, McGeachie M, Moore SC, Kraft P, Mathé E, Lasky-Su J. Integration of Metabolomic and Other Omics Data in Population-Based Study Designs: An Epidemiological Perspective. Metabolites 2019; 9:E117. [PMID: 31216675 PMCID: PMC6630728 DOI: 10.3390/metabo9060117] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2019] [Revised: 06/12/2019] [Accepted: 06/14/2019] [Indexed: 12/30/2022] Open
Abstract
It is not controversial that study design considerations and challenges must be addressed when investigating the linkage between single omic measurements and human phenotypes. It follows that such considerations are just as critical, if not more so, in the context of multi-omic studies. In this review, we discuss (1) epidemiologic principles of study design, including selection of biospecimen source(s) and the implications of the timing of sample collection, in the context of a multi-omic investigation, and (2) the strengths and limitations of various techniques of data integration across multi-omic data types that may arise in population-based studies utilizing metabolomic data.
Collapse
Affiliation(s)
- Su H Chu
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
| | - Mengna Huang
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
| | - Rachel S Kelly
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
| | - Elisa Benedetti
- Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA.
| | - Jalal K Siddiqui
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA.
| | - Oana A Zeleznik
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
| | - Alexandre Pereira
- Department of Genetics and Molecular Medicine, University of Sao Paulo Medical School, Sao Paulo 01246-903, Brazil.
| | - David Herrington
- Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27101, USA.
| | - Craig E Wheelock
- Division of Physiological Chemistry 2, Department of Medical Biochemistry and Biophysics, Karolinska Institute, 171 77 Stockholm, Sweden.
| | - Jan Krumsiek
- Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA.
| | - Michael McGeachie
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
| | - Steven C Moore
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA.
| | - Peter Kraft
- Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA.
| | - Ewy Mathé
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA.
| | - Jessica Lasky-Su
- Channing Division of Network Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
23
|
Aouiche C, Chen B, Shang X. Predicting stage-specific cancer related genes and their dynamic modules by integrating multiple datasets. BMC Bioinformatics 2019; 20:194. [PMID: 31074385 PMCID: PMC6509867 DOI: 10.1186/s12859-019-2740-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND The mechanism of many complex diseases has not been detected accurately in terms of their stage evolution. Previous studies mainly focus on the identification of associations between genes and individual diseases, but less is known about their associations with specific disease stages. Exploring biological modules through different disease stages could provide valuable knowledge to genomic and clinical research. RESULTS In this study, we proposed a powerful and versatile framework to identify stage-specific cancer related genes and their dynamic modules by integrating multiple datasets. The discovered modules and their specific-signature genes were significantly enriched in many relevant known pathways. To further illustrate the dynamic evolution of these clinical-stages, a pathway network was built by taking individual pathways as vertices and the overlapping relationship between their annotated genes as edges. CONCLUSIONS The identified pathway network not only help us to understand the functional evolution of complex diseases, but also useful for clinical management to select the optimum treatment regimens and the appropriate drugs for patients.
Collapse
Affiliation(s)
- Chaima Aouiche
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University Ministry of Industry and Information Technology, Xi'an, China
| | - Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University Ministry of Industry and Information Technology, Xi'an, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
- Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University Ministry of Industry and Information Technology, Xi'an, China
| |
Collapse
|
24
|
Alzu'bi AA, Zhou L, Watzlaf VJM. Genetic Variations and Precision Medicine. PERSPECTIVES IN HEALTH INFORMATION MANAGEMENT 2019; 16:1a. [PMID: 31019429 PMCID: PMC6462879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The time and costs associated with the sequencing of a human genome have decreased significantly in recent years. Many people have chosen to have their genomes sequenced to receive genomics-based personalized healthcare services. To reach the goal of genomics-based precision medicine, health information management (HIM) professionals need to manage and analyze patients' genomic data. Two important pieces of information from the genome sequence are the risk of genetic diseases and the specific medication or pharmacogenomic results for the individual patient, both of which are linked to a patient's genetic variations. In this review article, we introduce genetic variations, including their data types, relevant databases, and some currently available analysis methods and systems. HIM professionals can choose to use these databases, methods, and systems in the management and analysis of patients' genomic data.
Collapse
Affiliation(s)
- Amal Adel Alzu'bi
- The Department of Computer Information Systems at Jordan University of Science and Technology in Irbid, Jordan
| | - Leming Zhou
- The Department of Health Information Management at the University of Pittsburgh in Pittsburgh, PA
| | - Valerie J M Watzlaf
- The Department of Health Information Management at the University of Pittsburgh in Pittsburgh, PA
| |
Collapse
|
25
|
Fan S, Tang J, Tian Q, Wu C. A robust fuzzy rule based integrative feature selection strategy for gene expression data in TCGA. BMC Med Genomics 2019; 12:14. [PMID: 30704464 PMCID: PMC6357346 DOI: 10.1186/s12920-018-0451-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Lots of researches have been conducted in the selection of gene signatures that could distinguish the cancer patients from the normal. However, it is still an open question on how to extract the robust gene features. METHODS In this work, a gene signature selection strategy for TCGA data was proposed by integrating the gene expression data, the methylation data and the prior knowledge about cancer biomarkers. Different from the traditional integration method, the expanded 450 K methylation data were applied instead of the original 450 K array data, and the reported biomarkers were weighted in the feature selection. Fuzzy rule based classification method and cross validation strategy were applied in the model construction for performance evaluation. RESULTS Our selected gene features showed prediction accuracy close to 100% in the cross validation with fuzzy rule based classification model on 6 cancers from TCGA. The cross validation performance of our proposed model is similar to other integrative models or RNA-seq only model, while the prediction performance on independent data is obviously better than other 5 models. The gene signatures extracted with our fuzzy rule based integrative feature selection strategy were more robust, and had the potential to get better prediction results. CONCLUSION The results indicated that the integration of expanded methylation data would cover more genes, and had greater capacity to retrieve the signature genes compared with the original 450 K methylation data. Also, the integration of the reported biomarkers was a promising way to improve the performance. PTCHD3 gene was selected as a discriminating gene in 3 out of the 6 cancers, which suggested that it might play important role in the cancer risk and would be worthy for the intensive investigation.
Collapse
Affiliation(s)
- Shicai Fan
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, 611731 Sichuan China
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, 611731 Sichuan China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| | - Jianxiong Tang
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, 611731 Sichuan China
| | - Qi Tian
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, 611731 Sichuan China
| | - Chunguo Wu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, 130012 China
| |
Collapse
|
26
|
Das S, Majumder PP, Chatterjee R, Chatterjee A, Mukhopadhyay I. A powerful method to integrate genotype and gene expression data for dissecting the genetic architecture of a disease. Genomics 2018; 111:1387-1394. [PMID: 30287403 DOI: 10.1016/j.ygeno.2018.09.011] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2018] [Revised: 08/14/2018] [Accepted: 09/17/2018] [Indexed: 01/17/2023]
Abstract
To decipher the genetic architecture of human disease, various types of omics data are generated. Two common omics data are genotypes and gene expression. Often genotype data for a large number of individuals and gene expression data for a few individuals are generated due to biological and technical reasons, leading to unequal sample sizes for different omics data. Unavailability of standard statistical procedure for integrating such datasets motivates us to propose a two-step multi-locus association method using latent variables. Our method is powerful than single/separate omics data analysis and it unravels comprehensively deep-seated signals through a single statistical model. Extensive simulation confirms that it is robust to various genetic models as its power increases with sample size and number of associated loci. It provides p-values very fast. Application to real dataset on psoriasis identifies 17 novel SNPs, functionally related to psoriasis-associated genes, at much smaller sample size than standard GWAS.
Collapse
Affiliation(s)
- Sarmistha Das
- Human Genetics Unit, Indian Statistical Institute, Kolkata, India
| | | | | | | | | |
Collapse
|
27
|
Qiu Z, Ye B, Yin L, Chen W, Xu Y, Chen X. Downregulation of AC061961.2, LING01-AS1, and RP11-13E1.5 is associated with dilated cardiomyopathy progression. J Cell Physiol 2018; 234:4460-4471. [PMID: 30203513 DOI: 10.1002/jcp.27247] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Accepted: 08/19/2018] [Indexed: 01/16/2023]
Abstract
This study aimed to explore long noncoding RNAs (lncRNAs) implicated in dilated cardiomyopathy (DCM). Ten samples of failing hearts collected from the left ventricles of patients with DCM undergoing heart transplants, and ten control samples obtained from normal heart donors were included in this study. After sequencing, differentially expressed genes (DEGs) and lncRNAs between DCM and controls were screened, followed with functional enrichment analysis and weighted gene coexpression network analysis (WGCNA). Five key lncNRAs were validated through real-time polymerase chain reaction (PCR). Total 1,398 DEGs were identified, including 267 lncRNAs. WGCNA identified seven modules that were significantly correlated with DCM. The top 50 genes in the three modules (black, dark-green, and green-yellow) were significantly correlated with DCM disease state. Four core enrichment lncRNAs, such as AC061961.2, LING01-AS1, and RP11-557H15.4, in the green-yellow module were associated with neurotransmitter secretion. Five core enrichment lncRNAs, such as KB-1299A7.2 and RP11-13E1.5, in the black module were associated with the functions of blood circulation and heart contraction. AC061961.2, LING01-AS1, and RP11-13E1.5 were confirmed to be downregulated in DCM tissues by real-time PCR. The current study suggests that downregulation of AC061961.2, LING01-AS1, and RP11-13E1.5 may be associated with DCM progression, which may serve as key diagnostic biomarkers and therapeutic targets for DCM.
Collapse
Affiliation(s)
- Zhibing Qiu
- Department of Cardiovascular Surgery, Nanjing First Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Bin Ye
- Department of Anesthesiology, Yangzhou Maternal and Child Health Hospital, Yangzhou, Jiangsu, China
| | - Li Yin
- Department of Cardiovascular Surgery, Nanjing First Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Wen Chen
- Department of Cardiovascular Surgery, Nanjing First Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Yueyue Xu
- Department of Cardiovascular Surgery, Nanjing First Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Xin Chen
- Department of Cardiovascular Surgery, Nanjing First Hospital, Nanjing Medical University, Nanjing, Jiangsu, China
| |
Collapse
|
28
|
Kotelnikova EA, Pyatnitskiy M, Paleeva A, Kremenetskaya O, Vinogradov D. Practical aspects of NGS-based pathways analysis for personalized cancer science and medicine. Oncotarget 2018; 7:52493-52516. [PMID: 27191992 PMCID: PMC5239569 DOI: 10.18632/oncotarget.9370] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 04/18/2016] [Indexed: 12/17/2022] Open
Abstract
Nowadays, the personalized approach to health care and cancer care in particular is becoming more and more popular and is taking an important place in the translational medicine paradigm. In some cases, detection of the patient-specific individual mutations that point to a targeted therapy has already become a routine practice for clinical oncologists. Wider panels of genetic markers are also on the market which cover a greater number of possible oncogenes including those with lower reliability of resulting medical conclusions. In light of the large availability of high-throughput technologies, it is very tempting to use complete patient-specific New Generation Sequencing (NGS) or other "omics" data for cancer treatment guidance. However, there are still no gold standard methods and protocols to evaluate them. Here we will discuss the clinical utility of each of the data types and describe a systems biology approach adapted for single patient measurements. We will try to summarize the current state of the field focusing on the clinically relevant case-studies and practical aspects of data processing.
Collapse
Affiliation(s)
- Ekaterina A Kotelnikova
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Institute Biomedical Research August Pi Sunyer (IDIBAPS), Hospital Clinic of Barcelona, Barcelona, Spain
| | - Mikhail Pyatnitskiy
- Personal Biomedicine, Moscow, Russia.,Orekhovich Institute of Biomedical Chemistry, Moscow, Russia.,Pirogov Russian National Research Medical University, Moscow, Russia
| | | | - Olga Kremenetskaya
- Personal Biomedicine, Moscow, Russia.,Center for Theoretical Problems of Physicochemical Pharmacology, Russian Academy of Sciences, Moscow, Russia
| | - Dmitriy Vinogradov
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
29
|
Weiser M, Simon JM, Kochar B, Tovar A, Israel JW, Robinson A, Gipson GR, Schaner MS, Herfarth HH, Sartor RB, McGovern DP, Rahbar R, Sadiq TS, Koruda MJ, Furey TS, Sheikh SZ. Molecular classification of Crohn's disease reveals two clinically relevant subtypes. Gut 2018; 67:36-42. [PMID: 27742763 PMCID: PMC5426990 DOI: 10.1136/gutjnl-2016-312518] [Citation(s) in RCA: 84] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/27/2016] [Revised: 09/09/2016] [Accepted: 09/18/2016] [Indexed: 12/12/2022]
Abstract
OBJECTIVE The clinical presentation and course of Crohn's disease (CD) is highly variable. We sought to better understand the cellular and molecular mechanisms that guide this heterogeneity, and characterise the cellular processes associated with disease phenotypes. DESIGN We examined both gene expression and gene regulation (chromatin accessibility) in non-inflamed colon tissue from a cohort of adult patients with CD and control patients. To support the generality of our findings, we analysed previously published expression data from a large cohort of treatment-naïve paediatric CD and control ileum. RESULTS We found that adult patients with CD clearly segregated into two classes based on colon tissue gene expression-one that largely resembled the normal colon and one where certain genes showed expression patterns normally specific to the ileum. These classes were supported by changes in gene regulatory profiles observed at the level of chromatin accessibility, reflective of a fundamental shift in underlying molecular phenotypes. Furthermore, gene expression from the ilea of a treatment-naïve cohort of paediatric patients with CD could be similarly subdivided into colon-like and ileum-like classes. Finally, expression patterns within these CD subclasses highlight large-scale differences in the immune response and aspects of cellular metabolism, and were associated with multiple clinical phenotypes describing disease behaviour, including rectal disease and need for colectomy. CONCLUSIONS Our results strongly suggest that these molecular signatures define two clinically relevant forms of CD irrespective of tissue sampling location, patient age or treatment status.
Collapse
Affiliation(s)
- Matthew Weiser
- Department of Genetics, University of North Carolina at Chapel Hill,Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill
| | - Jeremy M. Simon
- Department of Genetics, University of North Carolina at Chapel Hill
| | - Bharati Kochar
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill,Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill
| | - Adelaide Tovar
- Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill,Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill
| | | | - Adam Robinson
- Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill
| | - Gregory R. Gipson
- Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill
| | - Matthew S. Schaner
- Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill
| | - Hans H. Herfarth
- Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill
| | - R. Balfour Sartor
- Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill
| | - Dermot P.B. McGovern
- F. Widjaja Foundation Inflammatory Bowel and Immunobiology Research Institute, Cedars-Sinai Medical Center, Los Angeles, California
| | - Reza Rahbar
- Department of Surgery, University of North Carolina at Chapel Hill
| | - Timothy S. Sadiq
- Department of Surgery, University of North Carolina at Chapel Hill
| | - Mark J. Koruda
- Department of Surgery, University of North Carolina at Chapel Hill
| | - Terrence S. Furey
- Department of Genetics, University of North Carolina at Chapel Hill,Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill,Department of Biology, University of North Carolina at Chapel Hill
| | - Shehzad Z. Sheikh
- Department of Genetics, University of North Carolina at Chapel Hill,Curriculum in Bioinformatics and Computational Biology, University of North Carolina at Chapel Hill,Center for Gastrointestinal Biology and Disease, University of North Carolina at Chapel Hill,Curriculum in Genetics and Molecular Biology, University of North Carolina at Chapel Hill
| |
Collapse
|
30
|
Momtaz R, Ghanem NM, El-Makky NM, Ismail MA. Integrated analysis of SNP, CNV and gene expression data in genetic association studies. Clin Genet 2017; 93:557-566. [PMID: 28685831 DOI: 10.1111/cge.13092] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 06/20/2017] [Accepted: 07/01/2017] [Indexed: 02/02/2023]
Abstract
Integrative approaches that combine multiple forms of data can more accurately capture pathway associations and so provide a comprehensive understanding of the molecular mechanisms that cause complex diseases. Association analyses based on single nucleotide polymorphism (SNP) genotypes, copy number variant (CNV) genotypes, and gene expression profiles are the 3 most common paradigms used for gene set/pathway enrichment analyses. Many work has been done to leverage information from 2 types of data from these 3 paradigms. However, to the best of our knowledge, there is no work done before to integrate the 3 paradigms all together. In this article, we present an integrated analysis that combine SNP, CNV, and gene expression data to generate a single gene list. We present different methods to compare this gene list with the other 3 possible lists that result from the combinations of the following pairs of data: SNP genotype with gene expression, CNV genotype with gene expression, and SNP genotype with CNV genotype. The comparison is done using 3 different cancer datasets and 2 different methods of comparison. Our results show that integrating SNP, CNV, and gene expression data give better association results than integrating any pair of 3 data.
Collapse
Affiliation(s)
- R Momtaz
- Computer and Systems Engineering Department, Alexandria University, Alexandria, Egypt
| | - N M Ghanem
- Computer and Systems Engineering Department, Alexandria University, Alexandria, Egypt
| | - N M El-Makky
- Computer and Systems Engineering Department, Alexandria University, Alexandria, Egypt
| | - M A Ismail
- Computer and Systems Engineering Department, Alexandria University, Alexandria, Egypt
| |
Collapse
|
31
|
Association of Inflammatory Bowel Disease with Arthritis: Evidence from In Silico Gene Expression Patterns and Network Topological Analysis. Interdiscip Sci 2017; 11:387-396. [DOI: 10.1007/s12539-017-0272-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2017] [Revised: 11/02/2017] [Accepted: 11/06/2017] [Indexed: 12/11/2022]
|
32
|
Kang M, Park J, Kim DC, Biswas AK, Liu C, Gao J. Multi-Block Bipartite Graph for Integrative Genomic Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1350-1358. [PMID: 27429442 DOI: 10.1109/tcbb.2016.2591521] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Human diseases involve a sequence of complex interactions between multiple biological processes. In particular, multiple genomic data such as Single Nucleotide Polymorphism (SNP), Copy Number Variation (CNV), DNA Methylation (DM), and their interactions simultaneously play an important role in human diseases. However, despite the widely known complex multi-layer biological processes and increased availability of the heterogeneous genomic data, most research has considered only a single type of genomic data. Furthermore, recent integrative genomic studies for the multiple genomic data have also been facing difficulties due to the high-dimensionality and complexity, especially when considering their intra- and inter-block interactions. In this paper, we introduce a novel multi-block bipartite graph and its inference methods, MB2I and sMB2I, for the integrative genomic study. The proposed methods not only integrate multiple genomic data but also incorporate intra/inter-block interactions by using a multi-block bipartite graph. In addition, the methods can be used to predict quantitative traits (e.g., gene expression, survival time) from the multi-block genomic data. The performance was assessed by simulation experiments that implement practical situations. We also applied the method to the human brain data of psychiatric disorders. The experimental results were analyzed by maximum edge biclique and biclustering, and biological findings were discussed.
Collapse
|
33
|
Chappell GA, Israel JW, Simon JM, Pott S, Safi A, Eklund K, Sexton KG, Bodnar W, Lieb JD, Crawford GE, Rusyn I, Furey TS. Variation in DNA-Damage Responses to an Inhalational Carcinogen (1,3-Butadiene) in Relation to Strain-Specific Differences in Chromatin Accessibility and Gene Transcription Profiles in C57BL/6J and CAST/EiJ Mice. ENVIRONMENTAL HEALTH PERSPECTIVES 2017; 125:107006. [PMID: 29038090 PMCID: PMC5944832 DOI: 10.1289/ehp1937] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Revised: 08/30/2017] [Accepted: 09/05/2017] [Indexed: 06/07/2023]
Abstract
BACKGROUND The damaging effects of exposure to environmental toxicants differentially affect genetically distinct individuals, but the mechanisms contributing to these differences are poorly understood. Genetic variation affects the establishment of the gene regulatory landscape and thus gene expression, and we hypothesized that this contributes to the observed heterogeneity in individual responses to exogenous cellular insults. OBJECTIVES We performed an in vivo study of how genetic variation and chromatin organization may dictate susceptibility to DNA damage, and influence the cellular response to such damage, caused by an environmental toxicant. MATERIALS AND METHODS We measured DNA damage, messenger RNA (mRNA) and microRNA (miRNA) expression, and genome-wide chromatin accessibility in lung tissue from two genetically divergent inbred mouse strains, C57BL/6J and CAST/EiJ, both in unexposed mice and in mice exposed to a model DNA-damaging chemical, 1,3-butadiene. RESULTS Our results showed that unexposed CAST/EiJ and C57BL/6J mice have very different chromatin organization and transcription profiles in the lung. Importantly, in unexposed CAST/EiJ mice, which acquired relatively less 1,3-butadiene-induced DNA damage, we observed increased transcription and a more accessible chromatin landscape around genes involved in detoxification pathways. Upon chemical exposure, chromatin was significantly remodeled in the lung of C57BL/6J mice, a strain that acquired higher levels of 1,3-butadiene-induced DNA damage, around the same genes, ultimately resembling the molecular profile of CAST/EiJ. CONCLUSIONS These results suggest that strain-specific changes in chromatin and transcription in response to chemical exposure lead to a "compensation" for underlying genetic-driven interindividual differences in the baseline chromatin and transcriptional state. This work represents an example of how chemical and environmental exposures can be evaluated to better understand gene-by-environment interactions, and it demonstrates the important role of chromatin response in transcriptomic changes and, potentially, in deleterious effects of exposure. https://doi.org/10.1289/EHP1937.
Collapse
Affiliation(s)
- Grace A Chappell
- Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station , Texas, USA
- Department of Environmental Sciences and Engineering, University of North Carolina , Chapel Hill, North Carolina, USA
| | - Jennifer W Israel
- Department of Genetics, University of North Carolina , Chapel Hill, North Carolina, USA
| | - Jeremy M Simon
- Department of Genetics, University of North Carolina , Chapel Hill, North Carolina, USA
| | - Sebastian Pott
- Department of Human Genetics, University of Chicago , Chicago, Illinois, USA
| | - Alexias Safi
- Department of Pediatrics, Duke Center for Genomic and Computational Biology, Duke University , Durham, North Carolina, USA
| | - Karl Eklund
- Department of Genetics, University of North Carolina , Chapel Hill, North Carolina, USA
| | - Kenneth G Sexton
- Department of Environmental Sciences and Engineering, University of North Carolina , Chapel Hill, North Carolina, USA
| | - Wanda Bodnar
- Department of Environmental Sciences and Engineering, University of North Carolina , Chapel Hill, North Carolina, USA
| | - Jason D Lieb
- Department of Human Genetics, University of Chicago , Chicago, Illinois, USA
| | - Gregory E Crawford
- Department of Pediatrics, Duke Center for Genomic and Computational Biology, Duke University , Durham, North Carolina, USA
| | - Ivan Rusyn
- Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station , Texas, USA
| | - Terrence S Furey
- Department of Genetics, University of North Carolina , Chapel Hill, North Carolina, USA
- Department of Biology, University of North Carolina , Chapel Hill, North Carolina, USA
- UNC Lineberger Comprehensive Cancer Center, University of North Carolina School of Medicine , Chapel Hill, North Carolina, USA
| |
Collapse
|
34
|
Liu H, Zhou T, Wang B, Li L, Ye D, Yu S. Identification and functional analysis of a potential key lncRNA involved in fat loss of cancer cachexia. J Cell Biochem 2017; 119:1679-1688. [PMID: 28782835 DOI: 10.1002/jcb.26328] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2017] [Accepted: 08/03/2017] [Indexed: 01/31/2023]
Abstract
Cancer cachexia is a devastating, multifactorial, and irreversible syndrome characterized by skeletal muscle reduction with or without fat loss. Although much attention has been focused on muscle wasting, fat loss may occur earlier and accelerate muscle wasting in cachexia. The cause of 20% of cancer related death makes it urgent to discover molecular mechanisms behind cancer cachexia. Here we applied weighted gene co-expression network analysis (WGCNA) to identify cachexia related gene modules using differentially expressed 3289 genes and 59 long non-coding RNAs based on microarray data of cachectic and non-cachectic subcutaneous adipose tissue. Subsequently, 16 independent modules were acquired and GSAASeqSP Toolset confirmed that black module was significantly associated with fat loss in cancer cachexia. Top 50 hub-genes in black module contained only one lncRNA, VLDLR antisense RNA 1 (VLDLR-AS1). We then explored the function of black module from the view of VLDLR-AS1-connected genes in the network. GO enrichment and KEGG pathways analysis revealed LDLR-AS1-connected genes were involved in Wnt signaling pathway, small GTPase mediated signal transduction, epithelial-mesenchymal transition and so on. Through construction of competing endogenous RNAs (ceRNAs) regulation network, we showed that VLDLR-AS1 may function with hsa-miR-600 to regulate gene GOLGA3, DUSP14, and UCHL1, or interact with hsa-miR-1224-3p to modulate the expression of gene GOLGA3, ZNF219, RNF141, and CALU. After literature validation, we predicted that VLDLR-AS1 most likely interacted with miR-600 to regulate UCH-L1 through Wnt/β-catenin signaling pathway. However, further experiments are still required to validate mechanisms of VLDLR-AS1 in fat reduction of cancer cachexia.
Collapse
Affiliation(s)
- Huiquan Liu
- Cancer Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Ting Zhou
- Cancer Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Bangyan Wang
- Cancer Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Lu Li
- Cancer Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Dawei Ye
- Cancer Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Shiying Yu
- Cancer Center, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
35
|
Chu SH, Huang YT. Integrated genomic analysis of biological gene sets with applications in lung cancer prognosis. BMC Bioinformatics 2017; 18:336. [PMID: 28697753 PMCID: PMC5505153 DOI: 10.1186/s12859-017-1737-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 06/22/2017] [Indexed: 01/22/2023] Open
Abstract
Background Burgeoning interest in integrative analyses has produced a rise in studies which incorporate data from multiple genomic platforms. Literature for conducting formal hypothesis testing on an integrative gene set level is considerably sparse. This paper is biologically motivated by our interest in the joint effects of epigenetic methylation loci and their associated mRNA gene expressions on lung cancer survival status. Results We provide an efficient screening approach across multiplatform genomic data on the level of biologically related sets of genes, and our methods are applicable to various disease models regardless whether the underlying true model is known (iTEGS) or unknown (iNOTE). Our proposed testing procedure dominated two competing methods. Using our methods, we identified a total of 28 gene sets with significant joint epigenomic and transcriptomic effects on one-year lung cancer survival. Conclusions We propose efficient variance component-based testing procedures to facilitate the joint testing of multiplatform genomic data across an entire gene set. The testing procedure for the gene set is self-contained, and can easily be extended to include more or different genetic platforms. iTEGS and iNOTE implemented in R are freely available through the inote package at https://cran.r-project.org//. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1737-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Su Hee Chu
- Department of Epidemiology, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA.,Channing Division of Network Medicine, Brigham and Women's Hospital Harvard Medical School, 181 Longwood Ave, Boston, MA, USA
| | - Yen-Tsung Huang
- Department of Epidemiology, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA. .,Department of Biostatistics, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA. .,Institute of Statistical Science, Academia Sinica, No. 128, Section 2, Academia Rd, Taipei City, Taiwan.
| |
Collapse
|
36
|
Markunas CA, Johnson EO, Hancock DB. Comprehensive evaluation of disease- and trait-specific enrichment for eight functional elements among GWAS-identified variants. Hum Genet 2017; 136:911-919. [PMID: 28567521 DOI: 10.1007/s00439-017-1815-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 05/22/2017] [Indexed: 01/17/2023]
Abstract
Genome-wide association study (GWAS)-identified variants are enriched for functional elements. However, we have limited knowledge of how functional enrichment may differ by disease/trait and tissue type. We tested a broad set of eight functional elements for enrichment among GWAS-identified SNPs (p < 5×10-8) from the NHGRI-EBI Catalog across seven disease/trait categories: cancer, cardiovascular disease, diabetes, autoimmune disease, psychiatric disease, neurological disease, and anthropometric traits. SNPs were annotated using HaploReg for the eight functional elements across any tissue: DNase sites, expression quantitative trait loci (eQTL), sequence conservation, enhancers, promoters, missense variants, sequence motifs, and protein binding sites. In addition, tissue-specific annotations were considered for brain vs. blood. Disease/trait SNPs were compared to a control set of 4809 SNPs matched to the GWAS SNPs (N = 1639) on allele frequency, gene density, distance to nearest gene, and linkage disequilibrium at ~3:1 ratio. Enrichment analyses were conducted using logistic regression, with Bonferroni correction. Overall, a significant enrichment was observed for all functional elements, except sequence motifs. Missense SNPs showed the strongest magnitude of enrichment. eQTLs were the only functional element significantly enriched across all diseases/traits. Magnitudes of enrichment were generally similar across diseases/traits, where enrichment was statistically significant. Blood vs. brain tissue effects on enrichment were dependent on disease/trait and functional element (e.g., cardiovascular disease: eQTLs P TissueDifference = 1.28 × 10-6 vs. enhancers P TissueDifference = 0.94). Identifying disease/trait-relevant functional elements and tissue types could provide new insight into the underlying biology, by guiding a priori GWAS analyses (e.g., brain enhancer elements for psychiatric disease) or facilitating post hoc interpretation.
Collapse
Affiliation(s)
- Christina A Markunas
- Behavioral Health and Criminal Justice Division, RTI International, Research Triangle Park, NC, USA.
| | - Eric O Johnson
- Behavioral Health and Criminal Justice Division, RTI International, Research Triangle Park, NC, USA.,Fellow Program, RTI International, Research Triangle Park, NC, USA
| | - Dana B Hancock
- Behavioral Health and Criminal Justice Division, RTI International, Research Triangle Park, NC, USA
| |
Collapse
|
37
|
Pasaniuc B, Price AL. Dissecting the genetics of complex traits using summary association statistics. Nat Rev Genet 2017; 18:117-127. [PMID: 27840428 PMCID: PMC5449190 DOI: 10.1038/nrg.2016.142] [Citation(s) in RCA: 274] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
During the past decade, genome-wide association studies (GWAS) have been used to successfully identify tens of thousands of genetic variants associated with complex traits and diseases. These studies have produced extensive repositories of genetic variation and trait measurements across large numbers of individuals, providing tremendous opportunities for further analyses. However, privacy concerns and other logistical considerations often limit access to individual-level genetic data, motivating the development of methods that analyse summary association statistics. Here, we review recent progress on statistical methods that leverage summary association data to gain insights into the genetic basis of complex traits and diseases.
Collapse
Affiliation(s)
- Bogdan Pasaniuc
- Departments of Human Genetics, and Pathology and Laboratory Medicine, University of California, Los Angeles, California 90095, USA
| | - Alkes L Price
- Departments of Epidemiology and Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts 02115, USA
- Program in Medical and Population Genetics, Broad Institute, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
38
|
Zhao SD, Cai TT, Cappola TP, Margulies KB, Li H. Sparse simultaneous signal detection for identifying genetically controlled disease genes. J Am Stat Assoc 2017; 112:1032-1046. [PMID: 29375169 PMCID: PMC5784841 DOI: 10.1080/01621459.2016.1270825] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2015] [Accepted: 12/01/2016] [Indexed: 10/20/2022]
Abstract
Genome-wide association studies (GWAS) and differential expression analyses have had limited success in finding genes that cause complex diseases such as heart failure (HF), a leading cause of death in the United States. This paper proposes a new statistical approach that integrates GWAS and expression quantitative trait loci (eQTL) data to identify important HF genes. For such genes, genetic variations that perturb its expression are also likely to influence disease risk. The proposed method thus tests for the presence of simultaneous signals: SNPs that are associated with the gene's expression as well as with disease. An analytic expression for the p-value is obtained, and the method is shown to be asymptotically adaptively optimal under certain conditions. It also allows the GWAS and eQTL data to be collected from different groups of subjects, enabling investigators to integrate public resources with their own data. Simulation experiments show that it can be more powerful than standard approaches and also robust to linkage disequilibrium between variants. The method is applied to an extensive analysis of HF genomics and identifies several genes with biological evidence for being functionally relevant in the etiology of HF. It is implemented in the R package ssa.
Collapse
Affiliation(s)
- Sihai Dave Zhao
- Department of Statistics, University of Illinois at Urbana-Champaign
| | - T Tony Cai
- Department of Statistics, The Wharton School, University of Pennsylvania
| | - Thomas P Cappola
- Penn Cardiovascular Institute and Department of Medicine, Perelman School of Medicine, University of Pennsylvania
| | - Kenneth B Margulies
- Penn Cardiovascular Institute and Department of Medicine, Perelman School of Medicine, University of Pennsylvania
| | - Hongzhe Li
- Department of Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania
| |
Collapse
|
39
|
Wang X, Sui W, Wu W, Hou X, Ou M, Xiang Y, Dai Y. Whole-genome resequencing of 100 healthy individuals using DNA pooling. Exp Ther Med 2016; 12:3143-3150. [PMID: 27882129 PMCID: PMC5103757 DOI: 10.3892/etm.2016.3797] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2015] [Accepted: 08/11/2016] [Indexed: 12/27/2022] Open
Abstract
With the advent of next-generation sequencing technology, the cost of sequencing has significantly decreased. However, sequencing costs remain high for large-scale studies. In the present study, DNA pooling was applied as a cost-effective strategy for sequencing. The sequencing results for 100 healthy individuals obtained via whole-genome resequencing and using DNA pooling are presented in the present study. In order to minimise the likelihood of systematic bias in sampling, paired-end libraries with an insert size of 500 bp were prepared for all samples and then subjected to whole-genome sequencing using four lanes for each library and resulting in at least a 30-fold haploid coverage for each sample. The NCBI human genome build37 (hg19) was used as a reference genome for the present study and the short reads were aligned to the reference genome achieving 99.84% coverage. In addition, the average sequencing depth was 32.76. In total, ~3 million single-nucleotide polymorphisms were identified, of which 99.88% were in the NCBI dbSNP database. Furthermore, ~600,000 small insertion/deletions, 500,000 structure variants, 5,000 copy number variations and 13,000 single nucleotide variants were identified. According to the present study, the whole genome has been sequenced for a small sample subjects from southern China for the first time. Furthermore, new variation sites were identified by comparing with the reference sequence, and new knowledge of the human genome variation was added to the human genomic databases. Furthermore, the particular distribution regions of variation were illustrated by analyzing various sites of variation, such as single-nucleotide polymorphisms.
Collapse
Affiliation(s)
- Xiaobin Wang
- Health Management Centre, The Affiliated Guilin Hospital, Southern Medical University, Guilin, Guangxi 541000, P.R. China; Guangxi Key Laboratory of Metabolic Diseases Research, Guilin, Guangxi 541000, P.R. China
| | - Weiguo Sui
- Guangxi Key Laboratory of Metabolic Diseases Research, Guilin, Guangxi 541000, P.R. China; Department of Nephrology, Guilin 181st Hospital, Guilin, Guangxi 541000, P.R. China
| | - Weiqing Wu
- Health Management Centre, The Second Clinical Medical College, Jinan University, Shenzhen, Guangdong 518001, P.R. China
| | - Xianliang Hou
- Guangxi Key Laboratory of Metabolic Diseases Research, Guilin, Guangxi 541000, P.R. China; Department of Nephrology, Guilin 181st Hospital, Guilin, Guangxi 541000, P.R. China; College of Life Science, Guangxi Normal University, Guilin, Guangxi 541001, P.R. China
| | - Minglin Ou
- Guangxi Key Laboratory of Metabolic Diseases Research, Guilin, Guangxi 541000, P.R. China; Department of Nephrology, Guilin 181st Hospital, Guilin, Guangxi 541000, P.R. China
| | - Yueying Xiang
- Health Management Centre, The Affiliated Guilin Hospital, Southern Medical University, Guilin, Guangxi 541000, P.R. China
| | - Yong Dai
- Guangxi Key Laboratory of Metabolic Diseases Research, Guilin, Guangxi 541000, P.R. China; Department of Nephrology, Guilin 181st Hospital, Guilin, Guangxi 541000, P.R. China; Clinical Medical Research Center, The Second Clinical Medical College, Jinan University, Shenzhen, Guangdong 518001, P.R. China
| |
Collapse
|
40
|
Kao PYP, Leung KH, Chan LWC, Yip SP, Yap MKH. Pathway analysis of complex diseases for GWAS, extending to consider rare variants, multi-omics and interactions. Biochim Biophys Acta Gen Subj 2016; 1861:335-353. [PMID: 27888147 DOI: 10.1016/j.bbagen.2016.11.030] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Revised: 10/17/2016] [Accepted: 11/19/2016] [Indexed: 12/20/2022]
Abstract
BACKGROUND Genome-wide association studies (GWAS) is a major method for studying the genetics of complex diseases. Finding all sequence variants to explain fully the aetiology of a disease is difficult because of their small effect sizes. To better explain disease mechanisms, pathway analysis is used to consolidate the effects of multiple variants, and hence increase the power of the study. While pathway analysis has previously been performed within GWAS only, it can now be extended to examining rare variants, other "-omics" and interaction data. SCOPE OF REVIEW 1. Factors to consider in the choice of software for GWAS pathway analysis. 2. Examples of how pathway analysis is used to analyse rare variants, other "-omics" and interaction data. MAJOR CONCLUSIONS To choose appropriate software tools, factors for consideration include covariate compatibility, null hypothesis, one- or two-step analysis required, curation method of gene sets, size of pathways, and size of flanking regions to define gene boundaries. For rare variants, analysis performance depends on consistency between assumed and actual effect distribution of variants. Integration of other "-omics" data and interaction can better explain gene functions. GENERAL SIGNIFICANCE Pathway analysis methods will be more readily used for integration of multiple sources of data, and enable more accurate prediction of phenotypes.
Collapse
Affiliation(s)
- Patrick Y P Kao
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Kim Hung Leung
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Lawrence W C Chan
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Shea Ping Yip
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong SAR, China.
| | - Maurice K H Yap
- Centre for Myopia Research, School of Optometry, The Hong Kong Polytechnic University, Hong Kong SAR, China
| |
Collapse
|
41
|
Richardson S, Tseng GC, Sun W. Statistical Methods in Integrative Genomics. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2016; 3:181-209. [PMID: 27482531 PMCID: PMC4963036 DOI: 10.1146/annurev-statistics-041715-033506] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions.
Collapse
Affiliation(s)
- Sylvia Richardson
- MRC Biostatistics Unit, Cambridge Institute of Public Health, University of Cambridge, CB2 0SR, United Kingdom
| | - George C. Tseng
- Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA 15261
| | - Wei Sun
- Department of Biostatistics, Department of Genetics, University of North Carolina, Chapel Hill, NC 27599
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, Washington 27516
| |
Collapse
|
42
|
McDonald JU, Kaforou M, Clare S, Hale C, Ivanova M, Huntley D, Dorner M, Wright VJ, Levin M, Martinon-Torres F, Herberg JA, Tregoning JS. A Simple Screening Approach To Prioritize Genes for Functional Analysis Identifies a Role for Interferon Regulatory Factor 7 in the Control of Respiratory Syncytial Virus Disease. mSystems 2016; 1:e00051-16. [PMID: 27822537 PMCID: PMC5069771 DOI: 10.1128/msystems.00051-16] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2016] [Accepted: 05/26/2016] [Indexed: 12/21/2022] Open
Abstract
Greater understanding of the functions of host gene products in response to infection is required. While many of these genes enable pathogen clearance, some enhance pathogen growth or contribute to disease symptoms. Many studies have profiled transcriptomic and proteomic responses to infection, generating large data sets, but selecting targets for further study is challenging. Here we propose a novel data-mining approach combining multiple heterogeneous data sets to prioritize genes for further study by using respiratory syncytial virus (RSV) infection as a model pathogen with a significant health care impact. The assumption was that the more frequently a gene is detected across multiple studies, the more important its role is. A literature search was performed to find data sets of genes and proteins that change after RSV infection. The data sets were standardized, collated into a single database, and then panned to determine which genes occurred in multiple data sets, generating a candidate gene list. This candidate gene list was validated by using both a clinical cohort and in vitro screening. We identified several genes that were frequently expressed following RSV infection with no assigned function in RSV control, including IFI27, IFIT3, IFI44L, GBP1, OAS3, IFI44, and IRF7. Drilling down into the function of these genes, we demonstrate a role in disease for the gene for interferon regulatory factor 7, which was highly ranked on the list, but not for IRF1, which was not. Thus, we have developed and validated an approach for collating published data sets into a manageable list of candidates, identifying novel targets for future analysis. IMPORTANCE Making the most of "big data" is one of the core challenges of current biology. There is a large array of heterogeneous data sets of host gene responses to infection, but these data sets do not inform us about gene function and require specialized skill sets and training for their utilization. Here we describe an approach that combines and simplifies these data sets, distilling this information into a single list of genes commonly upregulated in response to infection with RSV as a model pathogen. Many of the genes on the list have unknown functions in RSV disease. We validated the gene list with new clinical, in vitro, and in vivo data. This approach allows the rapid selection of genes of interest for further, more-detailed studies, thus reducing time and costs. Furthermore, the approach is simple to use and widely applicable to a range of diseases.
Collapse
Affiliation(s)
- Jacqueline U. McDonald
- Mucosal Infection and Immunity Group, Section of Virology, Imperial College London, St. Mary’s Campus, London, United Kingdom
| | - Myrsini Kaforou
- Section of Paediatrics, Imperial College London, St. Mary’s Campus, London, United Kingdom
| | - Simon Clare
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Christine Hale
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
| | - Maria Ivanova
- Mucosal Infection and Immunity Group, Section of Virology, Imperial College London, St. Mary’s Campus, London, United Kingdom
| | - Derek Huntley
- Imperial College Centre for Integrative Systems Biology and Bioinformatics, Imperial College London, London, United Kingdom
| | - Marcus Dorner
- Molecular Virology, Section of Virology, Imperial College London, St. Mary’s Campus, London, United Kingdom
| | - Victoria J. Wright
- Section of Paediatrics, Imperial College London, St. Mary’s Campus, London, United Kingdom
| | - Michael Levin
- Section of Paediatrics, Imperial College London, St. Mary’s Campus, London, United Kingdom
| | - Federico Martinon-Torres
- Department of Paediatrics, Hospital Clínico Universitario de Santiago, Santiago de Compostela, Spain
| | - Jethro A. Herberg
- Section of Paediatrics, Imperial College London, St. Mary’s Campus, London, United Kingdom
| | - John S. Tregoning
- Mucosal Infection and Immunity Group, Section of Virology, Imperial College London, St. Mary’s Campus, London, United Kingdom
| |
Collapse
|
43
|
Li D, Budoff MJ. Genetics paired with CT angiography in the setting of atherosclerosis. Clin Imaging 2016; 40:917-25. [PMID: 27183141 DOI: 10.1016/j.clinimag.2016.04.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Revised: 03/14/2016] [Accepted: 04/21/2016] [Indexed: 12/31/2022]
Abstract
Coronary artery disease (CAD) continues to be the leading cause of morbidity and mortality globally. Although the etiological mechanisms for CAD have not been fully elucidated, however, most would agree that atherosclerotic plaques progressively narrow the coronary arteries are the earliest manifestations and the principal cause of CAD. The emergence of revolutionary imaging technologies such as cardiac CT angiography, noninvasive computed fractional flow reserve and intravascular ultrasound provided the possibility of detecting and monitoring phenotypes associated with subclinical atherosclerosis. Meanwhile, with the widespread use of high-throughput genotyping pipeline such as next-generation sequencing, combined with big data-driven solutions in bioinformatics, translating the emerging genetic technologies into clinical practice and, therefore, provide valuable insight into the CAD study. In this review, we briefly describe the latest noninvasive cardiac imaging techniques for atherosclerosis-related phenotypes' detection, mainly focusing on the coronary artery calcification, plaque burden and stenosis. Furthermore, we highlight the state-of-the-art genotyping techniques and its application in the field of CAD translational study. Finally, we discuss the clinical relevance of genetics paired with noninvasive imaging in the setting of coronary artery atherosclerosis.
Collapse
Affiliation(s)
- Dong Li
- Los Angeles Biomedical Research Institute.
| | | |
Collapse
|
44
|
Pan-cancer subtyping in a 2D-map shows substructures that are driven by specific combinations of molecular characteristics. Sci Rep 2016; 6:24949. [PMID: 27109935 PMCID: PMC4842960 DOI: 10.1038/srep24949] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2015] [Accepted: 04/07/2016] [Indexed: 02/02/2023] Open
Abstract
The use of genome-wide data in cancer research, for the identification of groups of patients with similar molecular characteristics, has become a standard approach for applications in therapy-response, prognosis-prediction, and drug-development. To progress in these applications, the trend is to move from single genome-wide measurements in a single cancer-type towards measuring several different molecular characteristics across multiple cancer-types. Although current approaches shed light on molecular characteristics of various cancer-types, detailed relationships between patients within cancer clusters are unclear. We propose a novel multi-omic integration approach that exploits the joint behavior of the different molecular characteristics, supports visual exploration of the data by a two-dimensional landscape, and inspection of the contribution of the different genome-wide data-types. We integrated 4,434 samples across 19 cancer-types, derived from TCGA, containing gene expression, DNA-methylation, copy-number variation and microRNA expression data. Cluster analysis revealed 18 clusters, where three clusters showed a complex collection of cancer-types, squamous-cell-carcinoma, colorectal cancers, and a novel grouping of kidney-cancers. Sixty-four samples were identified outside their tissue-of-origin cluster. Known and novel patient subgroups were detected for Acute Myeloid Leukemia’s, and breast cancers. Quantification of the contributions of the different molecular types showed that substructures are driven by specific (combinations of) molecular characteristics.
Collapse
|
45
|
Deng L, Hou L, Zhang J, Tang X, Cheng Z, Li G, Fang X, Xu J, Zhang X, Xu R. Polymorphism of rs3737597 in DISC1 Gene on Chromosome 1q42.2 in sALS Patients: a Chinese Han Population Case-Control Study. Mol Neurobiol 2016; 54:3162-3179. [DOI: 10.1007/s12035-016-9869-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2016] [Accepted: 03/17/2016] [Indexed: 01/10/2023]
|
46
|
Keith SA, Maddux SK, Zhong Y, Chinchankar MN, Ferguson AA, Ghazi A, Fisher AL. Graded Proteasome Dysfunction in Caenorhabditis elegans Activates an Adaptive Response Involving the Conserved SKN-1 and ELT-2 Transcription Factors and the Autophagy-Lysosome Pathway. PLoS Genet 2016; 12:e1005823. [PMID: 26828939 PMCID: PMC4734690 DOI: 10.1371/journal.pgen.1005823] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2015] [Accepted: 12/31/2015] [Indexed: 12/30/2022] Open
Abstract
The maintenance of cellular proteins in a biologically active and structurally stable state is a vital endeavor involving multiple cellular pathways. One such pathway is the ubiquitin-proteasome system that represents a major route for protein degradation, and reductions in this pathway usually have adverse effects on the health of cells and tissues. Here, we demonstrate that loss-of-function mutants of the Caenorhabditis elegans proteasome subunit, RPN-10, exhibit moderate proteasome dysfunction and unexpectedly develop both increased longevity and enhanced resistance to multiple threats to the proteome, including heat, oxidative stress, and the presence of aggregation prone proteins. The rpn-10 mutant animals survive through the activation of compensatory mechanisms regulated by the conserved SKN-1/Nrf2 and ELT-2/GATA transcription factors that mediate the increased expression of genes encoding proteasome subunits as well as those mediating oxidative- and heat-stress responses. Additionally, we find that the rpn-10 mutant also shows enhanced activity of the autophagy-lysosome pathway as evidenced by increased expression of the multiple autophagy genes including atg-16.2, lgg-1, and bec-1, and also by an increase in GFP::LGG-1 puncta. Consistent with a critical role for this pathway, the enhanced resistance of the rpn-10 mutant to aggregation prone proteins depends on autophagy genes atg-13, atg-16.2, and prmt-1. Furthermore, the rpn-10 mutant is particularly sensitive to the inhibition of lysosome activity via either RNAi or chemical means. We also find that the rpn-10 mutant shows a reduction in the numbers of intestinal lysosomes, and that the elt-2 gene also plays a novel and vital role in controlling the production of functional lysosomes by the intestine. Overall, these experiments suggest that moderate proteasome dysfunction could be leveraged to improve protein homeostasis and organismal health and longevity, and that the rpn-10 mutant provides a unique platform to explore these possibilities.
Collapse
Affiliation(s)
- Scott A. Keith
- Division of Geriatric Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Sarah K. Maddux
- Division of Geriatrics, Gerontology, and Palliative Medicine, Department of Medicine, The University of Texas Health Science Center at San Antonio (UTHSCSA), San Antonio, Texas, United States of America
- Center for Healthy Aging, Barshop Institute for Longevity and Aging Studies, The University of Texas Health Science Center at San Antonio (UTHSCSA), San Antonio, Texas, United States of America
| | - Yayu Zhong
- Division of Geriatrics, Gerontology, and Palliative Medicine, Department of Medicine, The University of Texas Health Science Center at San Antonio (UTHSCSA), San Antonio, Texas, United States of America
- Center for Healthy Aging, Barshop Institute for Longevity and Aging Studies, The University of Texas Health Science Center at San Antonio (UTHSCSA), San Antonio, Texas, United States of America
| | - Meghna N. Chinchankar
- Division of Geriatrics, Gerontology, and Palliative Medicine, Department of Medicine, The University of Texas Health Science Center at San Antonio (UTHSCSA), San Antonio, Texas, United States of America
- Center for Healthy Aging, Barshop Institute for Longevity and Aging Studies, The University of Texas Health Science Center at San Antonio (UTHSCSA), San Antonio, Texas, United States of America
| | - Annabel A. Ferguson
- Division of Geriatric Medicine, Department of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Arjumand Ghazi
- Rangos Research Center, Department of Pediatrics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Alfred L. Fisher
- Division of Geriatrics, Gerontology, and Palliative Medicine, Department of Medicine, The University of Texas Health Science Center at San Antonio (UTHSCSA), San Antonio, Texas, United States of America
- Center for Healthy Aging, Barshop Institute for Longevity and Aging Studies, The University of Texas Health Science Center at San Antonio (UTHSCSA), San Antonio, Texas, United States of America
- San Antonio GRECC, South Texas VA Healthcare System, San Antonio, Texas, United States of America
| |
Collapse
|
47
|
Thingholm LB, Andersen L, Makalic E, Southey MC, Thomassen M, Hansen LL. Strategies for Integrated Analysis of Genetic, Epigenetic, and Gene Expression Variation in Cancer: Addressing the Challenges. Front Genet 2016; 7:2. [PMID: 26870081 PMCID: PMC4740898 DOI: 10.3389/fgene.2016.00002] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2015] [Accepted: 01/11/2016] [Indexed: 12/15/2022] Open
Abstract
The development and progression of cancer, a collection of diseases with complex genetic architectures, is facilitated by the interplay of multiple etiological factors. This complexity challenges the traditional single-platform study design and calls for an integrated approach to data analysis. However, integration of heterogeneous measurements of biological variation is a non-trivial exercise due to the diversity of the human genome and the variety of output data formats and genome coverage obtained from the commonly used molecular platforms. This review article will provide an introduction to integration strategies used for analyzing genetic risk factors for cancer. We critically examine the ability of these strategies to handle the complexity of the human genome and also accommodate information about the biological and functional interactions between the elements that have been measured-making the assessment of disease risk against a composite genomic factor possible. The focus of this review is to provide an overview and introduction to the main strategies and to discuss where there is a need for further development.
Collapse
Affiliation(s)
- Louise B Thingholm
- Department of Pathology, The University of MelbourneMelbourne, VIC, Australia; Department of Biomedicine, The University of AarhusAarhus, Denmark
| | - Lars Andersen
- Department of Clinical Genetics, Odense University Hospital Odense, Denmark
| | - Enes Makalic
- Centre for Epidemiology and Biostatistics, The University of Melbourne Melbourne, VIC, Australia
| | - Melissa C Southey
- Department of Pathology, The University of Melbourne Melbourne, VIC, Australia
| | - Mads Thomassen
- Department of Clinical Genetics, Odense University Hospital Odense, Denmark
| | | |
Collapse
|
48
|
Sun YV, Hu YJ. Integrative Analysis of Multi-omics Data for Discovery and Functional Studies of Complex Human Diseases. ADVANCES IN GENETICS 2016; 93:147-90. [PMID: 26915271 DOI: 10.1016/bs.adgen.2015.11.004] [Citation(s) in RCA: 256] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Complex and dynamic networks of molecules are involved in human diseases. High-throughput technologies enable omics studies interrogating thousands to millions of makers with similar biochemical properties (eg, transcriptomics for RNA transcripts). However, a single layer of "omics" can only provide limited insights into the biological mechanisms of a disease. In the case of genome-wide association studies, although thousands of single nucleotide polymorphisms have been identified for complex diseases and traits, the functional implications and mechanisms of the associated loci are largely unknown. Additionally, the genomic variants alone are not able to explain the changing disease risk across the life span. DNA, RNA, protein, and metabolite often have complementary roles to jointly perform a certain biological function. Such complementary effects and synergistic interactions between omic layers in the life course can only be captured by integrative study of multiple molecular layers. Building upon the success in single-omics discovery research, population studies started adopting the multi-omics approach to better understanding the molecular function and disease etiology. Multi-omics approaches integrate data obtained from different omic levels to understand their interrelation and combined influence on the disease processes. Here, we summarize major omics approaches available in population research, and review integrative approaches and methodologies interrogating multiple omic layers, which enhance the gene discovery and functional analysis of human diseases. We seek to provide analytical recommendations for different types of multi-omics data and study designs to guide the emerging multi-omic research, and to suggest improvement of the existing analytical methods.
Collapse
Affiliation(s)
- Yan V Sun
- Department of Epidemiology, Rollins School of Public Health, Atlanta, GA, United States; Department of Biomedical Informatics, School of Medicine, Atlanta, GA, United States
| | - Yi-Juan Hu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| |
Collapse
|
49
|
Maass K, Shekhar A, Lu J, Kang G, See F, Kim EE, Delgado C, Shen S, Cohen L, Fishman GI. Isolation and characterization of embryonic stem cell-derived cardiac Purkinje cells. Stem Cells 2016; 33:1102-12. [PMID: 25524238 DOI: 10.1002/stem.1921] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Revised: 11/18/2014] [Accepted: 11/18/2014] [Indexed: 12/16/2022]
Abstract
The cardiac Purkinje fiber network is composed of highly specialized cardiomyocytes responsible for the synchronous excitation and contraction of the ventricles. Computational modeling, experimental animal studies, and intracardiac electrical recordings from patients with heritable and acquired forms of heart disease suggest that Purkinje cells (PCs) may also serve as critical triggers of life-threatening arrhythmias. Nonetheless, owing to the difficulty in isolating and studying this rare population of cells, the precise role of PC in arrhythmogenesis and the underlying molecular mechanisms responsible for their proarrhythmic behavior are not fully characterized. Conceptually, a stem cell-based model system might facilitate studies of PC-dependent arrhythmia mechanisms and serve as a platform to test novel therapeutics. Here, we describe the generation of murine embryonic stem cells (ESC) harboring pan-cardiomyocyte and PC-specific reporter genes. We demonstrate that the dual reporter gene strategy may be used to identify and isolate the rare ESC-derived PC (ESC-PC) from a mixed population of cardiogenic cells. ESC-PC display transcriptional signatures and functional properties, including action potentials, intracellular calcium cycling, and chronotropic behavior comparable to endogenous PC. Our results suggest that stem-cell derived PC are a feasible new platform for studies of developmental biology, disease pathogenesis, and screening for novel antiarrhythmic therapies.
Collapse
Affiliation(s)
- Karen Maass
- Leon H. Charney Division of Cardiology, New York University School of Medicine, New York, New York, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
He H, Lin D, Zhang J, Wang Y, Deng HW. Biostatistics, Data Mining and Computational Modeling. TRANSLATIONAL BIOINFORMATICS 2016. [DOI: 10.1007/978-94-017-7543-4_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|