1
|
Illa SK, Mumtaz S, Nath S, Mukherjee S, Mukherjee A. Characterization of runs of Homozygosity revealed genomic inbreeding and patterns of selection in indigenous sahiwal cattle. J Appl Genet 2024; 65:167-180. [PMID: 38110827 DOI: 10.1007/s13353-023-00816-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 11/29/2023] [Accepted: 12/05/2023] [Indexed: 12/20/2023]
Abstract
Runs of homozygosity (ROH) are contiguous genomic regions, homozygous across all sites which arise in an individual due to the parents transmitting identical haplotypes to their offspring. The genetic improvement program of Sahiwal cattle after decades of selection needs re-assessment of breeding strategy and population phenomena. Hence, the present study was carried out to optimize input parameters in PLINK for ROH estimates, to explore ROH islands and assessment of pedigree and genome-based inbreeding in Sahiwal cattle. The sliding window approach with parameters standardized to define ROH for the specific population under study was used for the identification of runs. The optimum maximum gap, density, window-snp and window-threshold were 250 Kb, 120 Kb/SNP, 10, 0.05 respectively and ROH patterns were also characterized. ROH islands were defined as the short homozygous genomic regions shared by a large proportion of individuals in a population, containing significantly higher occurrences of ROH than the population specific threshold level. These were identified using the -homozyg-group function of the PLINK v1.9 program. Our results indicated that the Islands of ROH harbor a few candidate genes, ACAD11, RFX4, BANP, UBA5 that are associated with major economic traits. The average FPED (Pedigree based inbreeding coefficient), FROH (Genomic inbreeding coefficient), FHOM (Inbreeding estimated as the ratio of observed and expected homozygous genotypes), FGRM (Inbreeding estimated on genomic relationship method) and FGRM0.5 (Inbreeding estimated from the diagonal of a GRM with allele frequencies near to 0.5) were 0.009, 0.091, 0.035, -0.104 and -0.009, respectively. Our study revealed the optimum parameter setting in PLINK viz. maximal gaps between two SNPs, minimal density of SNPs in a segment (in kb/SNP) and scanning window size to identify ROH segments, which will enable ROH estimation more efficient and comparable across various SNP genotyping-based studies. The result further emphasized the significant role of genomics in unraveling population diversity, selection signatures and inbreeding in the ongoing Sahiwal breed improvement programs.
Collapse
Affiliation(s)
- Satish Kumar Illa
- Livestock Research Station, Garividi, Sri Venkateswara Veterinary University, Tirupati, Andhra Pradesh State, India
| | - Shabahat Mumtaz
- Animal Husbandry Department, Kolkata, West Bengal State, India
| | - Sapna Nath
- College of Veterinary Science, Garividi, Sri Venkateswara Veterinary University, Tirupati, Andhra Pradesh State, India
| | - Sabyasachi Mukherjee
- Animal Genetics & Breeding Division, Indian Council of Agricultural Research (ICAR)-National Dairy Research Institute (NDRI), Karnal, Haryana State, India.
| | - Anupama Mukherjee
- Animal Genetics & Breeding Division, Indian Council of Agricultural Research (ICAR)-National Dairy Research Institute (NDRI), Karnal, Haryana State, India.
| |
Collapse
|
2
|
Emfinger CH, Clark LE, Yandell B, Schueler KL, Simonett SP, Stapleton DS, Mitok KA, Merrins MJ, Keller MP, Attie AD. Novel regulators of islet function identified from genetic variation in mouse islet Ca 2+ oscillations. eLife 2023; 12:RP88189. [PMID: 37787501 PMCID: PMC10547476 DOI: 10.7554/elife.88189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Open
Abstract
Insufficient insulin secretion to meet metabolic demand results in diabetes. The intracellular flux of Ca2+ into β-cells triggers insulin release. Since genetics strongly influences variation in islet secretory responses, we surveyed islet Ca2+ dynamics in eight genetically diverse mouse strains. We found high strain variation in response to four conditions: (1) 8 mM glucose; (2) 8 mM glucose plus amino acids; (3) 8 mM glucose, amino acids, plus 10 nM glucose-dependent insulinotropic polypeptide (GIP); and (4) 2 mM glucose. These stimuli interrogate β-cell function, α- to β-cell signaling, and incretin responses. We then correlated components of the Ca2+ waveforms to islet protein abundances in the same strains used for the Ca2+ measurements. To focus on proteins relevant to human islet function, we identified human orthologues of correlated mouse proteins that are proximal to glycemic-associated single-nucleotide polymorphisms in human genome-wide association studies. Several orthologues have previously been shown to regulate insulin secretion (e.g. ABCC8, PCSK1, and GCK), supporting our mouse-to-human integration as a discovery platform. By integrating these data, we nominate novel regulators of islet Ca2+ oscillations and insulin secretion with potential relevance for human islet function. We also provide a resource for identifying appropriate mouse strains in which to study these regulators.
Collapse
Affiliation(s)
| | - Lauren E Clark
- Department of Biochemistry, University of Wisconsin-MadisonMadisonUnited States
| | - Brian Yandell
- Department of Statistics, University of Wisconsin-MadisonMadisonUnited States
| | - Kathryn L Schueler
- Department of Biochemistry, University of Wisconsin-MadisonMadisonUnited States
| | - Shane P Simonett
- Department of Biochemistry, University of Wisconsin-MadisonMadisonUnited States
| | - Donnie S Stapleton
- Department of Biochemistry, University of Wisconsin-MadisonMadisonUnited States
| | - Kelly A Mitok
- Department of Biochemistry, University of Wisconsin-MadisonMadisonUnited States
| | - Matthew J Merrins
- Department of Medicine, Division of Endocrinology, University of Wisconsin-MadisonMadisonUnited States
- William S. Middleton Memorial Veterans HospitalMadisonUnited States
| | - Mark P Keller
- Department of Biochemistry, University of Wisconsin-MadisonMadisonUnited States
| | - Alan D Attie
- Department of Biochemistry, University of Wisconsin-MadisonMadisonUnited States
- Department of Medicine, Division of Endocrinology, University of Wisconsin-MadisonMadisonUnited States
- Department of Chemistry, University of Wisconsin-MadisonMadisonUnited States
| |
Collapse
|
3
|
Keller MP, Hudkins KL, Shalev A, Bhatnagar S, Kebede MA, Merrins MJ, Davis DB, Alpers CE, Kimple ME, Attie AD. What the BTBR/J mouse has taught us about diabetes and diabetic complications. iScience 2023; 26:107036. [PMID: 37360692 PMCID: PMC10285641 DOI: 10.1016/j.isci.2023.107036] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2023] Open
Abstract
Human and mouse genetics have delivered numerous diabetogenic loci, but it is mainly through the use of animal models that the pathophysiological basis for their contribution to diabetes has been investigated. More than 20 years ago, we serendipidously identified a mouse strain that could serve as a model of obesity-prone type 2 diabetes, the BTBR (Black and Tan Brachyury) mouse (BTBR T+ Itpr3tf/J, 2018) carrying the Lepob mutation. We went on to discover that the BTBR-Lepob mouse is an excellent model of diabetic nephropathy and is now widely used by nephrologists in academia and the pharmaceutical industry. In this review, we describe the motivation for developing this animal model, the many genes identified and the insights about diabetes and diabetes complications derived from >100 studies conducted in this remarkable animal model.
Collapse
Affiliation(s)
- Mark P. Keller
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| | - Kelly L. Hudkins
- Department of Pathology, University of Washington Medical Center, Seattle, WA 98195, USA
| | - Anath Shalev
- Department of Medicine, Division of Endocrinology, Diabetes, and Metabolism, University of Alabama at Birmingham, Birmingham, AL 35294, UK
| | - Sushant Bhatnagar
- Department of Medicine, Division of Endocrinology, Diabetes, and Metabolism, University of Alabama at Birmingham, Birmingham, AL 35294, UK
| | - Melkam A. Kebede
- School of Medical Sciences, Faculty of Medicine and Health, Charles Perkins Centre, University of Sydney, Camperdown, Sydney, NSW 2006, Australia
| | - Matthew J. Merrins
- Department of Medicine, Division of Endocrinology, Diabetes, and Metabolism, University of Wisconsin School of Medicine and Public Health, Madison, WI 53705, USA
- William S. Middleton Memorial Veterans Hospital, Madison, WI 53705, USA
| | - Dawn Belt Davis
- Department of Medicine, Division of Endocrinology, Diabetes, and Metabolism, University of Wisconsin School of Medicine and Public Health, Madison, WI 53705, USA
- William S. Middleton Memorial Veterans Hospital, Madison, WI 53705, USA
| | - Charles E. Alpers
- Department of Pathology, University of Washington Medical Center, Seattle, WA 98195, USA
| | - Michelle E. Kimple
- Department of Medicine, Division of Endocrinology, Diabetes, and Metabolism, University of Wisconsin School of Medicine and Public Health, Madison, WI 53705, USA
- William S. Middleton Memorial Veterans Hospital, Madison, WI 53705, USA
| | - Alan D. Attie
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
- Department of Medicine, Division of Endocrinology, Diabetes, and Metabolism, University of Wisconsin School of Medicine and Public Health, Madison, WI 53705, USA
- Department of Chemistry, University of Wisconsin-Madison, Madison, WI 53706, USA
| |
Collapse
|
4
|
Wei Z, Lee TCM. High-dimensional Multi-Task Learning using Multivariate Regression and Generalized Fiducial Inference. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2022.2090946] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Affiliation(s)
- Zhenyu Wei
- Department of Statistics, University of California, Davis, USA
| | | |
Collapse
|
5
|
Dixit A, Roy V. Analyzing relevance vector machines using a single penalty approach. Stat Anal Data Min 2022. [DOI: 10.1002/sam.11551] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Anand Dixit
- Department of Statistics Iowa State University Ames Iowa USA
| | - Vivekananda Roy
- Department of Statistics Iowa State University Ames Iowa USA
| |
Collapse
|
6
|
Li B, Duan H, Wang S, Wu J, Li Y. Establishment of an Artificial Neural Network Model Using Immune-Infiltration Related Factors for Endometrial Receptivity Assessment. Vaccines (Basel) 2022; 10:vaccines10020139. [PMID: 35214598 PMCID: PMC8875905 DOI: 10.3390/vaccines10020139] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 01/05/2022] [Accepted: 01/14/2022] [Indexed: 01/27/2023] Open
Abstract
Background: A comprehensive clinical strategy for infertility involves treatment and, more importantly, post-treatment evaluation. As a component of assessment, endometrial receptivity does not have a validated tool. This study was anchored on immune factors, which are critical factors affecting embryonic implantation. We aimed at establishing novel approaches for assessing endometrial receptivity to guide clinical practice. Methods: Immune-infiltration levels in the GSE58144 dataset (n = 115) from GEO were analysed by digital deconvolution and validated by immunofluorescence (n = 23). Then, modules that were most associated with M1/M2 macrophages and their hub genes were selected by weighted gene co-expression network as well as univariate analyses and validated using the GSE5099 macrophage dataset and qPCR analysis (n = 19). Finally, the artificial neural network model was established from hub genes and its predictive efficacy validated using the GSE165004 dataset (n = 72). Results: Dysregulation of M1 to M2 macrophage ratio is an important factor contributing to defective endometrial receptivity. M1/M2 related gene modules were enriched in three biological processes in macrophages: antigen presentation, interleukin-1-mediated signalling pathway, and phagosome acidification. Their hub genes were significantly altered in patients and associated with ribosomal, lysosomal, and proteasomal pathways. The established model exhibited an excellent predictive value in both datasets, with an accuracy of 98.3% and an AUC of 0.975 (95% CI 0.945–1). Conclusions: M1/M2 polarization influences endometrial receptivity by regulating three gene modules, while the established ANN model can be used to effectively assess endometrial receptivity to inform pregnancy and individualized clinical management strategies.
Collapse
Affiliation(s)
- Bohan Li
- Department of Minimally Invasive Gynecologic Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing Maternal and Child Health Care Hospital, Beijing 100006, China; (B.L.); (S.W.); (Y.L.)
| | - Hua Duan
- Department of Minimally Invasive Gynecologic Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing Maternal and Child Health Care Hospital, Beijing 100006, China; (B.L.); (S.W.); (Y.L.)
- Correspondence:
| | - Sha Wang
- Department of Minimally Invasive Gynecologic Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing Maternal and Child Health Care Hospital, Beijing 100006, China; (B.L.); (S.W.); (Y.L.)
| | - Jiajing Wu
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Capital Medical University, Beijing 100069, China;
| | - Yazhu Li
- Department of Minimally Invasive Gynecologic Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing Maternal and Child Health Care Hospital, Beijing 100006, China; (B.L.); (S.W.); (Y.L.)
| |
Collapse
|
7
|
Staerk C, Mayr A. Randomized boosting with multivariable base-learners for high-dimensional variable selection and prediction. BMC Bioinformatics 2021; 22:441. [PMID: 34530737 PMCID: PMC8447543 DOI: 10.1186/s12859-021-04340-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Accepted: 08/24/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Statistical boosting is a computational approach to select and estimate interpretable prediction models for high-dimensional biomedical data, leading to implicit regularization and variable selection when combined with early stopping. Traditionally, the set of base-learners is fixed for all iterations and consists of simple regression learners including only one predictor variable at a time. Furthermore, the number of iterations is typically tuned by optimizing the predictive performance, leading to models which often include unnecessarily large numbers of noise variables. RESULTS We propose three consecutive extensions of classical component-wise gradient boosting. In the first extension, called Subspace Boosting (SubBoost), base-learners can consist of several variables, allowing for multivariable updates in a single iteration. To compensate for the larger flexibility, the ultimate selection of base-learners is based on information criteria leading to an automatic stopping of the algorithm. As the second extension, Random Subspace Boosting (RSubBoost) additionally includes a random preselection of base-learners in each iteration, enabling the scalability to high-dimensional data. In a third extension, called Adaptive Subspace Boosting (AdaSubBoost), an adaptive random preselection of base-learners is considered, focusing on base-learners which have proven to be predictive in previous iterations. Simulation results show that the multivariable updates in the three subspace algorithms are particularly beneficial in cases of high correlations among signal covariates. In several biomedical applications the proposed algorithms tend to yield sparser models than classical statistical boosting, while showing a very competitive predictive performance also compared to penalized regression approaches like the (relaxed) lasso and the elastic net. CONCLUSIONS The proposed randomized boosting approaches with multivariable base-learners are promising extensions of statistical boosting, particularly suited for highly-correlated and sparse high-dimensional settings. The incorporated selection of base-learners via information criteria induces automatic stopping of the algorithms, promoting sparser and more interpretable prediction models.
Collapse
Affiliation(s)
- Christian Staerk
- Department of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Venusberg-Campus 1, 53127, Bonn, Germany.
| | - Andreas Mayr
- Department of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Venusberg-Campus 1, 53127, Bonn, Germany
| |
Collapse
|
8
|
Wu S, Hannig J, Lee TCM. Uncertainty quantification for principal component regression. Electron J Stat 2021. [DOI: 10.1214/21-ejs1837] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Suofei Wu
- Department of Statistics, University of California, Davis, One Shields Avenue, Davis, CA 95616, U.S.A
| | - Jan Hannig
- Department of Statistics & Operations Research, 318 Hanes Hall, University of North Carolina at Chapel Hill, NC 27599, U.S.A
| | - Thomas C. M. Lee
- Department of Statistics, University of California, Davis, One Shields Avenue, Davis, CA 95616, U.S.A
| |
Collapse
|
9
|
Staerk C, Kateri M, Ntzoufras I. High-dimensional variable selection via low-dimensional adaptive learning. Electron J Stat 2021. [DOI: 10.1214/21-ejs1797] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
10
|
Li B, Wang S, Duan H, Wang Y, Guo Z. Discovery of gene module acting on ubiquitin-mediated proteolysis pathway by co-expression network analysis for endometriosis. Reprod Biomed Online 2020; 42:429-441. [PMID: 33189575 DOI: 10.1016/j.rbmo.2020.10.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Revised: 10/02/2020] [Accepted: 10/07/2020] [Indexed: 12/14/2022]
Abstract
RESEARCH QUESTION Is abnormal gene module expression in the eutopic endometrium related to the occurrence of endometriosis? DESIGN Nine datasets of normal and eutopic endometrium were searched and collected through the National Center for Biotechnology Information Gene Expression Omnibus, which included genome-wide expression studies of 71 normal cases and 142 endometriosis cases. Surrogate variable analysis was used for dataset integration. The network module and hub genes were selected by weighted gene co-expression network analysis. Machine learning was used to establish a diagnostic model of endometriosis. RESULTS A gene module that was most relevant to endometriosis was selected through weighted gene co-expression network analysis. After further analysis of this module, four hub genes that represent the function of this module were selected: SCAF11, KRAS, MDM2 and KIF3A. Kyoto Encyclopedia of Genes and Genomes enrichment analysis of the four hub genes revealed that all of them were most highly correlated with genes enriched in the ubiquitin-mediated proteolysis pathway. Moreover, in the correlation analysis between hub genes and Jab1, SCAF11 was found to be closely related to Jab1. Furthermore, hub genes were effective indicators for clinical diagnosis. The deep machine learning diagnostic model based on hub genes was highly sensitive. CONCLUSIONS The gene module identified is highly correlated with endometriosis. The four hub genes in this module degrade p27kip1 through the ubiquitin-mediated proteolysis pathway to regulate the endometrium cell cycle and affect the development of endometriosis. The hub genes and the deep learning model based on them are valuable for clinical diagnosis.
Collapse
Affiliation(s)
- Bohan Li
- Department of Minimally Invasive Gynecologic Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing 100006, China
| | - Sha Wang
- Department of Minimally Invasive Gynecologic Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing 100006, China
| | - Hua Duan
- Department of Minimally Invasive Gynecologic Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing 100006, China.
| | - Yiyi Wang
- Department of Minimally Invasive Gynecologic Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing 100006, China
| | - Zhengchen Guo
- Department of Minimally Invasive Gynecologic Center, Beijing Obstetrics and Gynecology Hospital, Capital Medical University, Beijing 100006, China
| |
Collapse
|
11
|
Nussbaumer T, Debnath O, Wagner C, Heidari P. TraitCorr as a workbench for correlating gene expression measurements with phenotypic data. GENE REPORTS 2020. [DOI: 10.1016/j.genrep.2020.100649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
12
|
Abstract
Background micro RNA (miRNA) are important regulators of gene expression and may influence phenotypes and disease traits. The connection between genetics and miRNA expression can be determined through expression quantitative loci (eQTL) analysis, which has been extensively used in a variety of tissues, and in both human and model organisms. miRNA play an important role in brain-related diseases, but eQTL studies of miRNA in brain tissue are limited. We aim to catalog miRNA eQTL in brain tissue using miRNA expression measured on a recombinant inbred mouse panel. Because samples were collected without any intervention or treatment (naïve), the panel allows characterization of genetic influences on miRNAs’ expression levels. We used brain RNA expression levels of 881 miRNA and 1416 genomic locations to identify miRNA eQTL. To address multiple testing, we employed permutation p-values and subsequent zero permutation p-value correction. We also investigated the underlying biology of miRNA regulation using additional analyses, including hotspot analysis to search for regions controlling multiple miRNAs, and Bayesian network analysis to identify scenarios where a miRNA mediates the association between genotype and mRNA expression. We used addiction related phenotypes to illustrate the utility of our results. Results Thirty-eight miRNA eQTL were identified after appropriate multiple testing corrections. Ten of these miRNAs had target genes enriched for brain-related pathways and mapped to four miRNA eQTL hotspots. Bayesian network analysis revealed four biological networks relating genetic variation, miRNA expression and gene expression. Conclusions Our extensive evaluation of miRNA eQTL provides valuable insight into the role of miRNA regulation in brain tissue. Our miRNA eQTL analysis and extended statistical exploration identifies miRNA candidates in brain for future study.
Collapse
|
13
|
Narisetty NN, Shen J, He X. Skinny Gibbs: A Consistent and Scalable Gibbs Sampler for Model Selection. J Am Stat Assoc 2018. [DOI: 10.1080/01621459.2018.1482754] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/14/2022]
Affiliation(s)
- Naveen N. Narisetty
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, IL
| | - Juan Shen
- Department of Statistics, Fudan University, Shanghai, China
| | - Xuming He
- Department of Statistics, University of Michigan, Ann Arbor, MI
| |
Collapse
|
14
|
Jiang H. Sparse estimation based on square root nonconvex optimization in high-dimensional data. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2017.12.025] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
15
|
Shin M, Bhattacharya A, Johnson VE. Scalable Bayesian Variable Selection Using Nonlocal Prior Densities in Ultrahigh-dimensional Settings. Stat Sin 2018; 28:1053-1078. [PMID: 29643721 DOI: 10.5705/ss.202016.0167] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Bayesian model selection procedures based on nonlocal alternative prior densities are extended to ultrahigh dimensional settings and compared to other variable selection procedures using precision-recall curves. Variable selection procedures included in these comparisons include methods based on g-priors, reciprocal lasso, adaptive lasso, scad, and minimax concave penalty criteria. The use of precision-recall curves eliminates the sensitivity of our conclusions to the choice of tuning parameters. We find that Bayesian variable selection procedures based on nonlocal priors are competitive to all other procedures in a range of simulation scenarios, and we subsequently explain this favorable performance through a theoretical examination of their consistency properties. When certain regularity conditions apply, we demonstrate that the nonlocal procedures are consistent for linear models even when the number of covariates p increases sub-exponentially with the sample size n. A model selection procedure based on Zellner's g-prior is also found to be competitive with penalized likelihood methods in identifying the true model, but the posterior distribution on the model space induced by this method is much more dispersed than the posterior distribution induced on the model space by the nonlocal prior methods. We investigate the asymptotic form of the marginal likelihood based on the nonlocal priors and show that it attains a unique term that cannot be derived from the other Bayesian model selection procedures. We also propose a scalable and efficient algorithm called Simplified Shotgun Stochastic Search with Screening (S5) to explore the enormous model space, and we show that S5 dramatically reduces the computing time without losing the capacity to search the interesting region in the model space, at least in the simulation settings considered. The S5 algorithm is available in an R package BayesS5 on CRAN.
Collapse
Affiliation(s)
- Minsuk Shin
- Department of Statistics, Texas A&M University, Texas, U.S.A
| | | | - Valen E Johnson
- Department of Statistics, Texas A&M University, Texas, U.S.A
| |
Collapse
|
16
|
Jiang H, Dong Y. Dimension reduction based on a penalized kernel support vector machine model. Knowl Based Syst 2017. [DOI: 10.1016/j.knosys.2017.09.041] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
17
|
Cheng R, Doerge RW, Borevitz J. Novel Resampling Improves Statistical Power for Multiple-Trait QTL Mapping. G3 (BETHESDA, MD.) 2017; 7:813-822. [PMID: 28064191 PMCID: PMC5345711 DOI: 10.1534/g3.116.037531] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2016] [Accepted: 12/29/2016] [Indexed: 01/13/2023]
Abstract
Multiple-trait analysis typically employs models that associate a quantitative trait locus (QTL) with all of the traits. As a result, statistical power for QTL detection may not be optimal if the QTL contributes to the phenotypic variation in only a small proportion of the traits. Excluding QTL effects that contribute little to the test statistic can improve statistical power. In this article, we show that an optimal power can be achieved when the number of QTL effects is best estimated, and that a stringent criterion for QTL effect selection may improve power when the number of QTL effects is small but can reduce power otherwise. We investigate strategies for excluding trivial QTL effects, and propose a method that improves statistical power when the number of QTL effects is relatively small, and fairly maintains the power when the number of QTL effects is large. The proposed method first uses resampling techniques to determine the number of nontrivial QTL effects, and then selects QTL effects by the backward elimination procedure for significance test. We also propose a method for testing QTL-trait associations that are desired for biological interpretation in applications. We validate our methods using simulations and Arabidopsis thaliana transcript data.
Collapse
Affiliation(s)
- Riyan Cheng
- Research School of Biology, The Australian National University, Acton, Australian Capital Territory 2601, Australia, ARC Center of Excellence in Plant Energy Biology, The Australian National University, Acton, ACT 2601, Australia
| | - R W Doerge
- Department of Statistics, Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA 15213
| | - Justin Borevitz
- Research School of Biology, The Australian National University, Acton, Australian Capital Territory 2601, Australia, ARC Center of Excellence in Plant Energy Biology, The Australian National University, Acton, ACT 2601, Australia
| |
Collapse
|
18
|
Lusis AJ, Seldin MM, Allayee H, Bennett BJ, Civelek M, Davis RC, Eskin E, Farber CR, Hui S, Mehrabian M, Norheim F, Pan C, Parks B, Rau CD, Smith DJ, Vallim T, Wang Y, Wang J. The Hybrid Mouse Diversity Panel: a resource for systems genetics analyses of metabolic and cardiovascular traits. J Lipid Res 2016; 57:925-42. [PMID: 27099397 PMCID: PMC4878195 DOI: 10.1194/jlr.r066944] [Citation(s) in RCA: 113] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2016] [Revised: 04/12/2016] [Indexed: 02/07/2023] Open
Abstract
The Hybrid Mouse Diversity Panel (HMDP) is a collection of approximately 100 well-characterized inbred strains of mice that can be used to analyze the genetic and environmental factors underlying complex traits. While not nearly as powerful for mapping genetic loci contributing to the traits as human genome-wide association studies, it has some important advantages. First, environmental factors can be controlled. Second, relevant tissues are accessible for global molecular phenotyping. Finally, because inbred strains are renewable, results from separate studies can be integrated. Thus far, the HMDP has been studied for traits relevant to obesity, diabetes, atherosclerosis, osteoporosis, heart failure, immune regulation, fatty liver disease, and host-gut microbiota interactions. High-throughput technologies have been used to examine the genomes, epigenomes, transcriptomes, proteomes, metabolomes, and microbiomes of the mice under various environmental conditions. All of the published data are available and can be readily used to formulate hypotheses about genes, pathways and interactions.
Collapse
Affiliation(s)
- Aldons J Lusis
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA Microbiology, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA Human Genetics, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Marcus M Seldin
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Hooman Allayee
- Department of Preventive Medicine, University of Southern California Keck School of Medicine, Los Angeles, CA
| | - Brian J Bennett
- Department of Genetics, University of North Carolina, Chapel Hill, NC
| | - Mete Civelek
- Departments of Biomedical Engineering University of Virginia, Charlottesville, VA
| | - Richard C Davis
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Eleazar Eskin
- Departments of Computer Science, University of California-Los Angeles, Los Angeles, CA
| | - Charles R Farber
- Public Health Sciences, University of Virginia, Charlottesville, VA
| | - Simon Hui
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Margarete Mehrabian
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Frode Norheim
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Calvin Pan
- Human Genetics, University of California-Los Angeles, Los Angeles, CA
| | - Brian Parks
- Department of Nutritional Sciences, University of Wisconsin-Madison, Madison, WI
| | - Christoph D Rau
- Anesthesiology, University of California-Los Angeles, Los Angeles, CA
| | - Desmond J Smith
- Molecular and Medical Pharmacology, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Thomas Vallim
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| | - Yibin Wang
- Anesthesiology, University of California-Los Angeles, Los Angeles, CA
| | - Jessica Wang
- Departments of Medicine, David Geffen School of Medicine, University of California-Los Angeles, Los Angeles, CA
| |
Collapse
|
19
|
Wang L, Jiao Y, Wang Y, Zhang M, Gu W. Self-Confirmation and Ascertainment of the Candidate Genomic Regions of Complex Trait Loci - A None-Experimental Solution. PLoS One 2016; 11:e0153676. [PMID: 27203862 PMCID: PMC4874692 DOI: 10.1371/journal.pone.0153676] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Accepted: 04/01/2016] [Indexed: 01/24/2023] Open
Abstract
Over the past half century, thousands of quantitative trait loci (QTL) have been identified by using animal models and plant populations. However, the none-reliability and imprecision of the genomic regions of these loci have remained the major hurdle for the identification of the causal genes for the correspondent traits. We used a none-experimental strategy of strain number reduction for testing accuracy and ascertainment of the candidate region for QTL. We tested the strategy in over 400 analyses with data from 47 studies. These studies include: 1) studies with recombinant inbred (RI) strains of mice. We first tested two previously mapped QTL with well-defined genomic regions; We then tested additional four studies with known QTL regions; and finally we examined the reliability of QTL in 38 sets of data which are produced from relatively large numbers of RI strains, derived from C57BL/6J (B6) X DBA/2J (D2), known as BXD RI mouse strains; 2) studies with RI strains of rats and plants; and 3) studies using F2 populations in mice, rats and plants. In these cases, our method identified the reliability of mapped QTL and localized the candidate genes into the defined genomic regions. Our data also suggests that LRS score produced by permutation tests does not necessarily confirm the reliability of the QTL. Number of strains are not the reliable indicators for the accuracy of QTL either. Our strategy determines the reliability and accuracy of the genomic region of a QTL without any additional experimental study such as congenic breeding.
Collapse
Affiliation(s)
- Lishi Wang
- Department of Orthopedic Surgery & BME, -Campbell-Clinic, University of Tennessee Health Science Center, Memphis, TN, 38163, United States of America
- Department of Basic Research, Inner Mongolia Medical University, Inner Mongolia, 010110, PR China
| | - Yan Jiao
- Department of Orthopedic Surgery & BME, -Campbell-Clinic, University of Tennessee Health Science Center, Memphis, TN, 38163, United States of America
- Mudanjiang Medical College, Mudanjiang, Heilongjiang, 157001, PR China
| | - Yongjun Wang
- Department. of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, 100050, PR China
| | - Mengchen Zhang
- National Center of Soybean Research, Institute of Hebei Cereal and Oil Crops, Hebei Academy of Agriculture and Forestry Sciences, Shijiazhuang, Hebei, 050011, PR China
| | - Weikuan Gu
- Department of Orthopedic Surgery & BME, -Campbell-Clinic, University of Tennessee Health Science Center, Memphis, TN, 38163, United States of America
- Research Service, Veterans Affairs Medical Center, 1030 Jefferson Avenue, Memphis, TN, 38104, United States of America
- * E-mail:
| |
Collapse
|
20
|
|
21
|
Statistical and Computational Methods for Genetic Diseases: An Overview. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2015; 2015:954598. [PMID: 26106440 PMCID: PMC4464008 DOI: 10.1155/2015/954598] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Accepted: 04/23/2015] [Indexed: 12/19/2022]
Abstract
The identification of causes of genetic diseases has been carried out by several approaches with increasing complexity. Innovation of genetic methodologies leads to the production of large amounts of data that needs the support of statistical and computational methods to be correctly processed. The aim of the paper is to provide an overview of statistical and computational methods paying attention to methods for the sequence analysis and complex diseases.
Collapse
|
22
|
Jiang B, Liu JS. Bayesian Partition Models for Identifying Expression Quantitative Trait Loci. J Am Stat Assoc 2015; 110:1350-1361. [PMID: 29056798 DOI: 10.1080/01621459.2015.1049746] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Expression quantitative trait loci (eQTLs) are genomic locations associated with changes of expression levels of certain genes. By assaying gene expressions and genetic variations simultaneously on a genome-wide scale, scientists wish to discover genomic loci responsible for expression variations of a set of genes. The task can be viewed as a multivariate regression problem with variable selection on both responses (gene expression) and covariates (genetic variations), including also multi-way interactions among covariates. Instead of learning a predictive model of quantitative trait given combinations of genetic markers, we adopt an inverse modeling perspective to model the distribution of genetic markers conditional on gene expression traits. A particular strength of our method is its ability to detect interactive effects of genetic variations with high power even when their marginal effects are weak, addressing a key weakness of many existing eQTL mapping methods. Furthermore, we introduce a hierarchical model to capture the dependence structure among correlated genes. Through simulation studies and a real data example in yeast, we demonstrate how our Bayesian hierarchical partition model achieves a significantly improved power in detecting eQTLs compared to existing methods.
Collapse
Affiliation(s)
- Bo Jiang
- Harvard University, Cambridge, MA 02138
| | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, MA 02138
| |
Collapse
|
23
|
Song Q, Liang F. A split‐and‐merge Bayesian variable selection approach for ultrahigh dimensional regression. J R Stat Soc Series B Stat Methodol 2014. [DOI: 10.1111/rssb.12095] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
24
|
Abstract
In ultra-high dimensional data analysis, it is extremely challenging to identify important interaction effects, and a top concern in practice is computational feasibility. For a data set with n observations and p predictors, the augmented design matrix including all linear and order-2 terms is of size n × (p2 + 3p)/2. When p is large, say more than tens of hundreds, the number of interactions is enormous and beyond the capacity of standard machines and software tools for storage and analysis. In theory, the interaction selection consistency is hard to achieve in high dimensional settings. Interaction effects have heavier tails and more complex covariance structures than main effects in a random design, making theoretical analysis difficult. In this article, we propose to tackle these issues by forward-selection based procedures called iFOR, which identify interaction effects in a greedy forward fashion while maintaining the natural hierarchical model structure. Two algorithms, iFORT and iFORM, are studied. Computationally, the iFOR procedures are designed to be simple and fast to implement. No complex optimization tools are needed, since only OLS-type calculations are involved; the iFOR algorithms avoid storing and manipulating the whole augmented matrix, so the memory and CPU requirement is minimal; the computational complexity is linear in p for sparse models, hence feasible for p ≫ n. Theoretically, we prove that they possess sure screening property for ultra-high dimensional settings. Numerical examples are used to demonstrate their finite sample performance.
Collapse
Affiliation(s)
- Ning Hao
- Assistant Professor, Department of Mathematics, University of Arizona, Tucson, AZ 85721
| | - Hao Helen Zhang
- Associate Professor, Department of Mathematics, University of Arizona, Tucson, AZ 85721
| |
Collapse
|
25
|
|
26
|
Abstract
Hybrid dysfunction, a common feature of reproductive barriers between species, is often caused by negative epistasis between loci ("Dobzhansky-Muller incompatibilities"). The nature and complexity of hybrid incompatibilities remain poorly understood because identifying interacting loci that affect complex phenotypes is difficult. With subspecies in the early stages of speciation, an array of genetic tools, and detailed knowledge of reproductive biology, house mice (Mus musculus) provide a model system for dissecting hybrid incompatibilities. Male hybrids between M. musculus subspecies often show reduced fertility. Previous studies identified loci and several X chromosome-autosome interactions that contribute to sterility. To characterize the genetic basis of hybrid sterility in detail, we used a systems genetics approach, integrating mapping of gene expression traits with sterility phenotypes and QTL. We measured genome-wide testis expression in 305 male F2s from a cross between wild-derived inbred strains of M. musculus musculus and M. m. domesticus. We identified several thousand cis- and trans-acting QTL contributing to expression variation (eQTL). Many trans eQTL cluster into eleven 'hotspots,' seven of which co-localize with QTL for sterility phenotypes identified in the cross. The number and clustering of trans eQTL-but not cis eQTL-were substantially lower when mapping was restricted to a 'fertile' subset of mice, providing evidence that trans eQTL hotspots are related to sterility. Functional annotation of transcripts with eQTL provides insights into the biological processes disrupted by sterility loci and guides prioritization of candidate genes. Using a conditional mapping approach, we identified eQTL dependent on interactions between loci, revealing a complex system of epistasis. Our results illuminate established patterns, including the role of the X chromosome in hybrid sterility. The integrated mapping approach we employed is applicable in a broad range of organisms and we advocate for widespread adoption of a network-centered approach in speciation genetics.
Collapse
|
27
|
Gao C, Ju Z, Li S, Zuo J, Fu D, Tian H, Luo Y, Zhu B. Deciphering ascorbic acid regulatory pathways in ripening tomato fruit using a weighted gene correlation network analysis approach. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2013; 55:1080-1091. [PMID: 23718676 DOI: 10.1111/jipb.12079] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/09/2013] [Accepted: 05/21/2013] [Indexed: 06/02/2023]
Abstract
Genotype is generally determined by the co-expression of diverse genes and multiple regulatory pathways in plants. Gene co-expression analysis combining with physiological trait data provides very important information about the gene function and regulatory mechanism. L-Ascorbic acid (AsA), which is an essential nutrient component for human health and plant metabolism, plays key roles in diverse biological processes such as cell cycle, cell expansion, stress resistance, hormone synthesis, and signaling. Here, we applied a weighted gene correlation network analysis approach based on gene expression values and AsA content data in ripening tomato (Solanum lycopersicum L.) fruit with different AsA content levels, which leads to identification of AsA relevant modules and vital genes in AsA regulatory pathways. Twenty-four modules were compartmentalized according to gene expression profiling. Among these modules, one negatively related module containing genes involved in redox processes and one positively related module enriched with genes involved in AsA biosynthetic and recycling pathways were further analyzed. The present work herein indicates that redox pathways as well as hormone-signal pathways are closely correlated with AsA accumulation in ripening tomato fruit, and allowed us to prioritize candidate genes for follow-up studies to dissect this interplay at the biochemical and molecular level.
Collapse
Affiliation(s)
- Chao Gao
- Laboratory of Fruit Biology, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing, 100083, China
| | | | | | | | | | | | | | | |
Collapse
|
28
|
|
29
|
Zhang F, Gao B, Xu L, Li C, Hao D, Zhang S, Zhou M, Su F, Chen X, Zhi H, Li X. Allele-specific behavior of molecular networks: understanding small-molecule drug response in yeast. PLoS One 2013; 8:e53581. [PMID: 23308257 PMCID: PMC3537669 DOI: 10.1371/journal.pone.0053581] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2012] [Accepted: 11/30/2012] [Indexed: 11/18/2022] Open
Abstract
The study of systems genetics is changing the way the genetic and molecular basis of phenotypic variation, such as disease susceptibility and drug response, is being analyzed. Moreover, systems genetics aids in the translation of insights from systems biology into genetics. The use of systems genetics enables greater attention to be focused on the potential impact of genetic perturbations on the molecular states of networks that in turn affects complex traits. In this study, we developed models to detect allele-specific perturbations on interactions, in which a genetic locus with alternative alleles exerted a differing influence on an interaction. We utilized the models to investigate the dynamic behavior of an integrated molecular network undergoing genetic perturbations in yeast. Our results revealed the complexity of regulatory relationships between genetic loci and networks, in which different genetic loci perturb specific network modules. In addition, significant within-module functional coherence was found. We then used the network perturbation model to elucidate the underlying molecular mechanisms of individual differences in response to 100 diverse small molecule drugs. As a result, we identified sub-networks in the integrated network that responded to variations in DNA associated with response to diverse compounds and were significantly enriched for known drug targets. Literature mining results provided strong independent evidence for the effectiveness of these genetic perturbing networks in the elucidation of small-molecule responses in yeast.
Collapse
Affiliation(s)
- Fan Zhang
- College of Bioinformatics Science and Technology and The Second Affiliated Hospital, Harbin Medical University, Harbin, P. R. China
| | - Bo Gao
- College of Bioinformatics Science and Technology and The Second Affiliated Hospital, Harbin Medical University, Harbin, P. R. China
| | - Liangde Xu
- College of Bioinformatics Science and Technology and The Second Affiliated Hospital, Harbin Medical University, Harbin, P. R. China
| | - Chunquan Li
- College of Bioinformatics Science and Technology and The Second Affiliated Hospital, Harbin Medical University, Harbin, P. R. China
| | - Dapeng Hao
- College of Bioinformatics Science and Technology and The Second Affiliated Hospital, Harbin Medical University, Harbin, P. R. China
| | - Shaojun Zhang
- College of Bioinformatics Science and Technology and The Second Affiliated Hospital, Harbin Medical University, Harbin, P. R. China
| | - Meng Zhou
- College of Bioinformatics Science and Technology and The Second Affiliated Hospital, Harbin Medical University, Harbin, P. R. China
| | - Fei Su
- College of Bioinformatics Science and Technology and The Second Affiliated Hospital, Harbin Medical University, Harbin, P. R. China
| | - Xi Chen
- College of Bioinformatics Science and Technology and The Second Affiliated Hospital, Harbin Medical University, Harbin, P. R. China
| | - Hui Zhi
- College of Bioinformatics Science and Technology and The Second Affiliated Hospital, Harbin Medical University, Harbin, P. R. China
| | - Xia Li
- College of Bioinformatics Science and Technology and The Second Affiliated Hospital, Harbin Medical University, Harbin, P. R. China
- * E-mail:
| |
Collapse
|
30
|
Bondell HD, Reich BJ. Consistent high-dimensional Bayesian variable selection via penalized credible regions. J Am Stat Assoc 2012; 107:1610-1624. [PMID: 23482517 DOI: 10.1080/01621459.2012.716344] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
For high-dimensional data, particularly when the number of predictors greatly exceeds the sample size, selection of relevant predictors for regression is a challenging problem. Methods such as sure screening, forward selection, or penalized regressions are commonly used. Bayesian variable selection methods place prior distributions on the parameters along with a prior over model space, or equivalently, a mixture prior on the parameters having mass at zero. Since exhaustive enumeration is not feasible, posterior model probabilities are often obtained via long MCMC runs. The chosen model can depend heavily on various choices for priors and also posterior thresholds. Alternatively, we propose a conjugate prior only on the full model parameters and use sparse solutions within posterior credible regions to perform selection. These posterior credible regions often have closed-form representations, and it is shown that these sparse solutions can be computed via existing algorithms. The approach is shown to outperform common methods in the high-dimensional setting, particularly under correlation. By searching for a sparse solution within a joint credible region, consistent model selection is established. Furthermore, it is shown that, under certain conditions, the use of marginal credible intervals can give consistent selection up to the case where the dimension grows exponentially in the sample size. The proposed approach successfully accomplishes variable selection in the high-dimensional setting, while avoiding pitfalls that plague typical Bayesian variable selection methods.
Collapse
Affiliation(s)
- Howard D Bondell
- Department of Statistics, North Carolina State University, Box 8203, Raleigh, NC 27695, U.S.A
| | | |
Collapse
|
31
|
Wright FA, Shabalin AA, Rusyn I. Computational tools for discovery and interpretation of expression quantitative trait loci. Pharmacogenomics 2012; 13:343-52. [PMID: 22304583 DOI: 10.2217/pgs.11.185] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Expression quantitative trait locus (eQTL) analysis is rapidly moving from a cutting-edge concept in genomics to a mature area of investigation, with important connections to genome-wide association studies for human disease, pharmacogenomics and toxicogenomics. Despite the importance of the topic, many investigators must develop their own code or use tools not specifically suited for eQTL analysis. Convenient computational tools are becoming available, but they are not widely publicized, and investigators who are interested in discovery or eQTL, or in using them to interpret genome-wide association study results may have difficulty navigating the available resources. The purpose of this review is to help investigators find appropriate programs for eQTL analysis and interpretation.
Collapse
Affiliation(s)
- Fred A Wright
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | | | | |
Collapse
|
32
|
|
33
|
Houten SM, Denis S, Argmann CA, Jia Y, Ferdinandusse S, Reddy JK, Wanders RJA. Peroxisomal L-bifunctional enzyme (Ehhadh) is essential for the production of medium-chain dicarboxylic acids. J Lipid Res 2012; 53:1296-303. [PMID: 22534643 DOI: 10.1194/jlr.m024463] [Citation(s) in RCA: 110] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
L-bifunctional enzyme (Ehhadh) is part of the classical peroxisomal fatty acid β-oxidation pathway. This pathway is highly inducible via peroxisome proliferator-activated receptor α (PPARα) activation. However, no specific substrates or functions for Ehhadh are known, and Ehhadh knockout (KO) mice display no appreciable changes in lipid metabolism. To investigate Ehhadh functions, we used a bioinformatics approach and found that Ehhadh expression covaries with genes involved in the tricarboxylic acid cycle and in mitochondrial and peroxisomal fatty acid oxidation. Based on these findings and the regulation of Ehhadh's expression by PPARα, we hypothesized that the phenotype of Ehhadh KO mice would become apparent after fasting. Ehhadh mice tolerated fasting well but displayed a marked deficiency in the fasting-induced production of the medium-chain dicarboxylic acids adipic and suberic acid and of the carnitine esters thereof. The decreased levels of adipic and suberic acid were not due to a deficient induction of ω-oxidation upon fasting, as Cyp4a10 protein levels increased in wild-type and Ehhadh KO mice.We conclude that Ehhadh is indispensable for the production of medium-chain dicarboxylic acids, providing an explanation for the coordinated induction of mitochondrial and peroxisomal oxidative pathways during fasting.
Collapse
Affiliation(s)
- Sander M Houten
- Department of Clinical Chemistry, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands.
| | | | | | | | | | | | | |
Collapse
|
34
|
Vignes M, Vandel J, Allouche D, Ramadan-Alban N, Cierco-Ayrolles C, Schiex T, Mangin B, de Givry S. Gene regulatory network reconstruction using Bayesian networks, the Dantzig Selector, the Lasso and their meta-analysis. PLoS One 2011; 6:e29165. [PMID: 22216195 PMCID: PMC3246469 DOI: 10.1371/journal.pone.0029165] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2011] [Accepted: 11/22/2011] [Indexed: 11/18/2022] Open
Abstract
Modern technologies and especially next generation sequencing facilities are giving a cheaper access to genotype and genomic data measured on the same sample at once. This creates an ideal situation for multifactorial experiments designed to infer gene regulatory networks. The fifth "Dialogue for Reverse Engineering Assessments and Methods" (DREAM5) challenges are aimed at assessing methods and associated algorithms devoted to the inference of biological networks. Challenge 3 on "Systems Genetics" proposed to infer causal gene regulatory networks from different genetical genomics data sets. We investigated a wide panel of methods ranging from Bayesian networks to penalised linear regressions to analyse such data, and proposed a simple yet very powerful meta-analysis, which combines these inference methods. We present results of the Challenge as well as more in-depth analysis of predicted networks in terms of structure and reliability. The developed meta-analysis was ranked first among the 16 teams participating in Challenge 3A. It paves the way for future extensions of our inference method and more accurate gene network estimates in the context of genetical genomics.
Collapse
Affiliation(s)
- Matthieu Vignes
- SaAB Team/BIA Unit, INRA Toulouse, Castanet-Tolosan, France.
| | | | | | | | | | | | | | | |
Collapse
|
35
|
Zhan H, Xu S. Generalized linear mixed model for segregation distortion analysis. BMC Genet 2011; 12:97. [PMID: 22078575 PMCID: PMC3748016 DOI: 10.1186/1471-2156-12-97] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2011] [Accepted: 11/11/2011] [Indexed: 11/16/2022] Open
Abstract
Background Segregation distortion is a phenomenon that the observed genotypic frequencies of a locus fall outside the expected Mendelian segregation ratio. The main cause of segregation distortion is viability selection on linked marker loci. These viability selection loci can be mapped using genome-wide marker information. Results We developed a generalized linear mixed model (GLMM) under the liability model to jointly map all viability selection loci of the genome. Using a hierarchical generalized linear mixed model, we can handle the number of loci several times larger than the sample size. We used a dataset from an F2 mouse family derived from the cross of two inbred lines to test the model and detected a major segregation distortion locus contributing 75% of the variance of the underlying liability. Replicated simulation experiments confirm that the power of viability locus detection is high and the false positive rate is low. Conclusions Not only can the method be used to detect segregation distortion loci, but also used for mapping quantitative trait loci of disease traits using case only data in humans and selected populations in plants and animals.
Collapse
Affiliation(s)
- Haimao Zhan
- Department of Botany and Plant Sciences, University of California, Riverside, CA 92521, USA
| | | |
Collapse
|
36
|
Abstract
High-throughput genomics allows genome-wide quantification of gene expression levels in tissues and cell types and, when combined with sequence variation data, permits the identification of genetic control points of expression (expression QTL or eQTL). Clusters of eQTL influenced by single genetic polymorphisms can inform on hotspots of regulation of pathways and networks, although very few hotspots have been robustly detected, replicated, or experimentally verified. Here we present a novel modeling strategy to estimate the propensity of a genetic marker to influence several expression traits at the same time, based on a hierarchical formulation of related regressions. We implement this hierarchical regression model in a Bayesian framework using a stochastic search algorithm, HESS, that efficiently probes sparse subsets of genetic markers in a high-dimensional data matrix to identify hotspots and to pinpoint the individual genetic effects (eQTL). Simulating complex regulatory scenarios, we demonstrate that our method outperforms current state-of-the-art approaches, in particular when the number of transcripts is large. We also illustrate the applicability of HESS to diverse real-case data sets, in mouse and human genetic settings, and show that it provides new insights into regulatory hotspots that were not detected by conventional methods. The results suggest that the combination of our modeling strategy and algorithmic implementation provides significant advantages for the identification of functional eQTL hotspots, revealing key regulators underlying pathways.
Collapse
|
37
|
Joosen RVL, Ligterink W, Hilhorst HWM, Keurentjes JJB. Advances in genetical genomics of plants. Curr Genomics 2011; 10:540-9. [PMID: 20514216 PMCID: PMC2817885 DOI: 10.2174/138920209789503914] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2009] [Revised: 07/24/2009] [Accepted: 07/29/2009] [Indexed: 11/25/2022] Open
Abstract
Natural variation provides a valuable resource to study the genetic regulation of quantitative traits. In quantitative trait locus (QTL) analyses this variation, captured in segregating mapping populations, is used to identify the genomic regions affecting these traits. The identification of the causal genes underlying QTLs is a major challenge for which the detection of gene expression differences is of major importance. By combining genetics with large scale expression profiling (i.e. genetical genomics), resulting in expression QTLs (eQTLs), great progress can be made in connecting phenotypic variation to genotypic diversity. In this review we discuss examples from human, mouse, Drosophila, yeast and plant research to illustrate the advances in genetical genomics, with a focus on understanding the regulatory mechanisms underlying natural variation. With their tolerance to inbreeding, short generation time and ease to generate large families, plants are ideal subjects to test new concepts in genetics. The comprehensive resources which are available for Arabidopsis make it a favorite model plant but genetical genomics also found its way to important crop species like rice, barley and wheat. We discuss eQTL profiling with respect to cis and trans regulation and show how combined studies with other ‘omics’ technologies, such as metabolomics and proteomics may further augment current information on transcriptional, translational and metabolomic signaling pathways and enable reconstruction of detailed regulatory networks. The fast developments in the ‘omics’ area will offer great potential for genetical genomics to elucidate the genotype-phenotype relationships for both fundamental and applied research.
Collapse
Affiliation(s)
- R V L Joosen
- Laboratory of Plant Physiology, Wageningen University, Droevendaalsesteeg 1, NL-6708 PB Wageningen, The Netherlands
| | | | | | | |
Collapse
|
38
|
Wang W, Zhang X. Network-based group variable selection for detecting expression quantitative trait loci (eQTL). BMC Bioinformatics 2011; 12:269. [PMID: 21718480 PMCID: PMC3152919 DOI: 10.1186/1471-2105-12-269] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Accepted: 06/30/2011] [Indexed: 11/10/2022] Open
Abstract
Background Analysis of expression quantitative trait loci (eQTL) aims to identify the genetic loci associated with the expression level of genes. Penalized regression with a proper penalty is suitable for the high-dimensional biological data. Its performance should be enhanced when we incorporate biological knowledge of gene expression network and linkage disequilibrium (LD) structure between loci in high-noise background. Results We propose a network-based group variable selection (NGVS) method for QTL detection. Our method simultaneously maps highly correlated expression traits sharing the same biological function to marker sets formed by LD. By grouping markers, complex joint activity of multiple SNPs can be considered and the dimensionality of eQTL problem is reduced dramatically. In order to demonstrate the power and flexibility of our method, we used it to analyze two simulations and a mouse obesity and diabetes dataset. We considered the gene co-expression network, grouped markers into marker sets and treated the additive and dominant effect of each locus as a group: as a consequence, we were able to replicate results previously obtained on the mouse linkage dataset. Furthermore, we observed several possible sex-dependent loci and interactions of multiple SNPs. Conclusions The proposed NGVS method is appropriate for problems with high-dimensional data and high-noise background. On eQTL problem it outperforms the classical Lasso method, which does not consider biological knowledge. Introduction of proper gene expression and loci correlation information makes detecting causal markers more accurate. With reasonable model settings, NGVS can lead to novel biological findings.
Collapse
Affiliation(s)
- Weichen Wang
- Mathematics and Physics, School of Sciences, Tsinghua University, Beijing 100084, China.
| | | |
Collapse
|
39
|
Reilly Ayala HB, Wacker MA, Siwo G, Ferdig MT. Quantitative trait loci mapping reveals candidate pathways regulating cell cycle duration in Plasmodium falciparum. BMC Genomics 2010; 11:577. [PMID: 20955606 PMCID: PMC3091725 DOI: 10.1186/1471-2164-11-577] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2010] [Accepted: 10/18/2010] [Indexed: 11/24/2022] Open
Abstract
Background Elevated parasite biomass in the human red blood cells can lead to increased malaria morbidity. The genes and mechanisms regulating growth and development of Plasmodium falciparum through its erythrocytic cycle are not well understood. We previously showed that strains HB3 and Dd2 diverge in their proliferation rates, and here use quantitative trait loci mapping in 34 progeny from a cross between these parent clones along with integrative bioinformatics to identify genetic loci and candidate genes that control divergences in cell cycle duration. Results Genetic mapping of cell cycle duration revealed a four-locus genetic model, including a major genetic effect on chromosome 12, which accounts for 75% of the inherited phenotype variation. These QTL span 165 genes, the majority of which have no predicted function based on homology. We present a method to systematically prioritize candidate genes using the extensive sequence and transcriptional information available for the parent lines. Putative functions were assigned to the prioritized genes based on protein interaction networks and expression eQTL from our earlier study. DNA metabolism or antigenic variation functional categories were enriched among our prioritized candidate genes. Genes were then analyzed to determine if they interact with cyclins or other proteins known to be involved in the regulation of cell cycle. Conclusions We show that the divergent proliferation rate between a drug resistant and drug sensitive parent clone is under genetic regulation and is segregating as a complex trait in 34 progeny. We map a major locus along with additional secondary effects, and use the wealth of genome data to identify key candidate genes. Of particular interest are a nucleosome assembly protein (PFL0185c), a Zinc finger transcription factor (PFL0465c) both on chromosome 12 and a ribosomal protein L7Ae-related on chromosome 4 (PFD0960c).
Collapse
|
40
|
Keller B, Martini S, Sedor J, Kretzler M. Linking variants from genome-wide association analysis to function via transcriptional network analysis. Semin Nephrol 2010; 30:177-84. [PMID: 20347646 DOI: 10.1016/j.semnephrol.2010.01.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
A current challenge in interpretation of genome-wide association studies is to establish the mechanistic links between the measured genotype and observed phenotype. The integration of gene expression with disease genome-wide association studies is emerging as an important strategy for deciphering these regulatory mechanisms. For renal disease, the availability of both tissue- and disease-specific expression data makes the strategy a compelling option. In this review, three approaches of integrating single nucleotide polymorphism (SNP) genotypes with transcriptional regulation are discussed as follows: (1) interpreting the functional role of transcripts affected by a SNP, (2) identifying the mechanistic role of noncoding SNPs in regulation, and (3) identifying regulatory candidate SNPs with expression associations. Combining these strategies in an integrative manner should allow the discovery of more extensive regulatory information. Linking genetics to systems biology more directly promises the opportunity to explain how genetic variants contribute to disease in a truly holistic manner.
Collapse
Affiliation(s)
- Benjamin Keller
- Computer Science, Eastern Michigan University, Ann Arbor, MI, USA
| | | | | | | |
Collapse
|
41
|
Piechota M, Korostynski M, Solecki W, Gieryk A, Slezak M, Bilecki W, Ziolkowska B, Kostrzewa E, Cymerman I, Swiech L, Jaworski J, Przewlocki R. The dissection of transcriptional modules regulated by various drugs of abuse in the mouse striatum. Genome Biol 2010; 11:R48. [PMID: 20459597 PMCID: PMC2898085 DOI: 10.1186/gb-2010-11-5-r48] [Citation(s) in RCA: 120] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Revised: 04/14/2010] [Accepted: 05/04/2010] [Indexed: 01/30/2023] Open
Abstract
BACKGROUND Various drugs of abuse activate intracellular pathways in the brain reward system. These pathways regulate the expression of genes that are essential to the development of addiction. To reveal genes common and distinct for different classes of drugs of abuse, we compared the effects of nicotine, ethanol, cocaine, morphine, heroin and methamphetamine on gene expression profiles in the mouse striatum. RESULTS We applied whole-genome microarray profiling to evaluate detailed time-courses (1, 2, 4 and 8 hours) of transcriptome alterations following acute drug administration in mice. We identified 42 drug-responsive genes that were segregated into two main transcriptional modules. The first module consisted of activity-dependent transcripts (including Fos and Npas4), which are induced by psychostimulants and opioids. The second group of genes (including Fkbp5 and S3-12), which are controlled, in part, by the release of steroid hormones, was strongly activated by ethanol and opioids. Using pharmacological tools, we were able to inhibit the induction of particular modules of drug-related genomic profiles. We selected a subset of genes for validation by in situ hybridization and quantitative PCR. We also showed that knockdown of the drug-responsive genes Sgk1 and Tsc22d3 resulted in alterations to dendritic spines in mice, possibly reflecting an altered potential for plastic changes. CONCLUSIONS Our study identified modules of drug-induced genes that share functional relationships. These genes may play a critical role in the early stages of addiction.
Collapse
Affiliation(s)
- Marcin Piechota
- Department of Molecular Neuropharmacology, Institute of Pharmacology PAS, Smetna 12, Krakow, 31-343, Poland
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Abstract
Gene expression microarrays allow rapid and easy quantification of transcript accumulation for almost transcripts present in a genome. This technology has been utilized for diverse investigations from studying gene regulation in response to genetic or environmental fluctuation to global expression QTL (eQTL) analyses of natural variation. Typical analysis techniques focus on responses of individual genes in isolation of other genes. However, emerging evidence indicates that genes are organized into regulons, i.e., they respond as groups due to individual transcription factors binding multiple promoters, creating what is commonly called a network. We have developed a set of statistical approaches that allow researchers to test specific network hypothesis using a priori-defined gene networks. When applied to Arabidopsis thaliana this approach has been able to identify natural genetic variation that controls networks. In this chapter we describe approaches to develop and test specific network hypothesis utilizing natural genetic variation. This approach can be expanded to facilitate direct tests of the relationship between phenotypic trait and transcript genetic architecture. Finally, the use of a priori network definitions can be applied to any microarray experiment to directly conduct hypothesis testing at a genomics level.
Collapse
|
43
|
An expectation-maximization algorithm for the Lasso estimation of quantitative trait locus effects. Heredity (Edinb) 2010; 105:483-94. [PMID: 20051978 DOI: 10.1038/hdy.2009.180] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The least absolute shrinkage and selection operator (Lasso) estimation of regression coefficients can be expressed as Bayesian posterior mode estimation of the regression coefficients under various hierarchical modeling schemes. A Bayesian hierarchical model requires hyper prior distributions. The regression coefficients are parameters of interest. The normal distribution assigned to each regression coefficient is a prior distribution. The variance parameter in the normal prior distribution is further assigned a hyper prior distribution so that the variance parameter can be estimated from the data. We developed an expectation-maximization (EM) algorithm to estimate the variance parameter of the prior distribution for each regression coefficient. Performance of the EM algorithm was evaluated through simulation study and real data analysis. We found that the Jeffreys' hyper prior for the variance component usually performs well with regard to generating the desired sparseness of the regression model. The EM algorithm can handle not only the usual regression models but it also conveniently deals with linear models in which predictors are defined as classification variables. In the context of quantitative trait loci (QTL) mapping, this new EM algorithm can estimate both genotypic values and QTL effects expressed as linear contrasts of the genotypic values.
Collapse
|
44
|
Abstract
Since the introduction of genetical genomics in 2001, many studies have been published on various organisms, including mouse and rat. Genetical genomics makes use of the latest microarray profiling technologies and combines vast amounts of genotype and gene expression information, a strategy that has proven very successful in inbred line crosses. The data are analyzed using standard tools for linkage analysis to map the genetic determinants of gene expression variation. Typically, studies have singled out hundreds of genomic loci regulating the expression of nearby and distant genes (called local and distant expression quantitative trait loci, respectively; eQTLs). In this chapter, we provide a step-by-step guide to performing genome-wide linkage analysis in an eQTL mapping experiment by using the R statistical software framework.
Collapse
|
45
|
Cui Y, Li G, Li S, Wu R. Designs for linkage analysis and association studies of complex diseases. Methods Mol Biol 2010; 620:219-242. [PMID: 20652506 DOI: 10.1007/978-1-60761-580-4_6] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
Genetic linkage analysis has been a traditional means for identifying regions of the genome with large genetic effects that contribute to a disease. Following linkage analysis, association studies are widely pursued to fine-tune regions with significant linkage signals. For complex diseases which often involve function of multi-genetic variants each with small or moderate effect, linkage analysis has little power compared to association studies. In this chapter, we give a brief review of design issues related to linkage analysis and association studies with human genetic data. We introduce methods commonly used for linkage and association studies and compared the relative merits of the family-based and population-based association studies. Compared to candidate gene studies, a genomewide blind searching of disease variant is proving to be a more powerful approach. We briefly review the commonly used two-stage designs in genome-wide association studies. As more and more biological evidences indicate the role of genomic imprinting in disease, identifying imprinted genes becomes critically important. Design and analysis in genetic mapping imprinted genes are introduced in this chapter. Recent efforts in integrating gene expression analysis and genetic mapping, termed expression quantitative trait loci (eQTLs) mapping or genetical genomics analysis, offer new prospect in elucidating the genetic architecture of gene expression. Designs in genetical genomics analysis are also covered in this chapter.
Collapse
Affiliation(s)
- Yuehua Cui
- Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA
| | | | | | | |
Collapse
|
46
|
Lee E, Cho S, Kim K, Park T. An integrated approach to infer causal associations among gene expression, genotype variation, and disease. Genomics 2009; 94:269-77. [PMID: 19540336 DOI: 10.1016/j.ygeno.2009.06.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2008] [Revised: 04/15/2009] [Accepted: 06/14/2009] [Indexed: 01/28/2023]
Abstract
Gene expression data and genotype variation data are now capable of providing genome-wide patterns across many different clinical conditions. However, the separate analysis of these data has limitations in elucidating the complex network of gene interactions underlying complex traits, such as common human diseases. More information about the identity of key driver genes of common diseases comes from integrating these two heterogeneous types of data. We developed a two-step procedure to characterize complex diseases by integrating genotype variation data and gene expression data. The first step elucidates the causal relationship among genetic variation, gene expression level, and disease. Based on the causal relationship determined at the first step, the second step identifies significant gene expression traits whose effects on disease status or whose responses to disease status are modified by the specific genotype variation. For the selected significant genes, a pathway enrichment analysis can be performed to identify the genetic mechanism of a complex disease. The proposed two-step procedure was shown to be an effective method for integrating three different levels of data, i.e., genotype variation, gene expression and disease status. By applying the proposed procedure to a chronic fatigue syndrome (CFS) dataset, we identified a list of potential causal genes for CFS, and found an evidence for difference in genetic mechanisms of the etiology between CFS without 'a major depressive disorder with melancholic features' (CFS) and CFS with 'a major depressive disorder with melancholic features' (CFS-MDD/m). Especially, the SNPs within NR3C1 gene were shown to differently influence the susceptibility of developing CFS and CFS-MDD/m through integrative action with gene expression levels.
Collapse
Affiliation(s)
- Eunjee Lee
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea.
| | | | | | | |
Collapse
|
47
|
Pan W. Network-based multiple locus linkage analysis of expression traits. Bioinformatics 2009; 25:1390-6. [PMID: 19336446 PMCID: PMC2682520 DOI: 10.1093/bioinformatics/btp177] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2008] [Revised: 03/24/2009] [Accepted: 03/26/2009] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION We consider the problem of multiple locus linkage analysis for expression traits of genes in a pathway or a network. To capitalize on co-expression of functionally related genes, we propose a penalized regression method that maps multiple expression quantitative trait loci (eQTLs) for all related genes simultaneously while accounting for their shared functions as specified a priori by a gene pathway or network. RESULTS An analysis of a mouse dataset and simulation studies clearly demonstrate the advantage of the proposed method over a standard approach that ignores biological knowledge of gene networks.
Collapse
Affiliation(s)
- Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455-0378, USA.
| |
Collapse
|
48
|
Expression quantitative trait loci mapping with multivariate sparse partial least squares regression. Genetics 2009; 182:79-90. [PMID: 19270271 DOI: 10.1534/genetics.109.100362] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Expression quantitative trait loci (eQTL) mapping concerns finding genomic variation to elucidate variation of expression traits. This problem poses significant challenges due to high dimensionality of both the gene expression and the genomic marker data. We propose a multivariate response regression approach with simultaneous variable selection and dimension reduction for the eQTL mapping problem. Transcripts with similar expression are clustered into groups, and their expression profiles are viewed as a multivariate response. Then, we employ our recently developed sparse partial least-squares regression methodology to select markers associated with each cluster of genes. We demonstrate with extensive simulations that our eQTL mapping with multivariate response sparse partial least-squares regression (M-SPLS eQTL) method overcomes the issue of multiple transcript- or marker-specific analyses, thereby avoiding potential elevation of type I error. Additionally, joint analysis of multiple transcripts by multivariate response regression increases power for detecting weak linkages. We illustrate that M-SPLS eQTL compares competitively with other approaches and has a number of significant advantages, including the ability to handle highly correlated genotype data and computational efficiency. We provide an application of this methodology to a mouse data set concerning obesity and diabetes.
Collapse
|
49
|
van Nas A, Guhathakurta D, Wang SS, Yehya N, Horvath S, Zhang B, Ingram-Drake L, Chaudhuri G, Schadt EE, Drake TA, Arnold AP, Lusis AJ. Elucidating the role of gonadal hormones in sexually dimorphic gene coexpression networks. Endocrinology 2009; 150:1235-49. [PMID: 18974276 PMCID: PMC2654741 DOI: 10.1210/en.2008-0563] [Citation(s) in RCA: 171] [Impact Index Per Article: 11.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
We previously used high-density expression arrays to interrogate a genetic cross between strains C3H/HeJ and C57BL/6J and observed thousands of differences in gene expression between sexes. We now report analyses of the molecular basis of these sex differences and of the effects of sex on gene expression networks. We analyzed liver gene expression of hormone-treated gonadectomized mice as well as XX male and XY female mice. Differences in gene expression resulted in large part from acute effects of gonadal hormones acting in adulthood, and the effects of sex chromosomes, apart from hormones, were modest. We also determined whether there are sex differences in the organization of gene expression networks in adipose, liver, skeletal muscle, and brain tissue. Although coexpression networks of highly correlated genes were largely conserved between sexes, some exhibited striking sex dependence. We observed strong body fat and lipid correlations with sex-specific modules in adipose and liver as well as a sexually dimorphic network enriched for genes affected by gonadal hormones. Finally, our analyses identified chromosomal loci regulating sexually dimorphic networks. This study indicates that gonadal hormones play a strong role in sex differences in gene expression. In addition, it results in the identification of sex-specific gene coexpression networks related to genetic and metabolic traits.
Collapse
Affiliation(s)
- Atila van Nas
- Department of Human Genetics, University of California, Los Angeles, California 90095-1679, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Zhang D, Lin Y, Zhang M. Penalized orthogonal-components regression for large p small n data. Electron J Stat 2009. [DOI: 10.1214/09-ejs354] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|