1
|
Jensen TD, Ni B, Reuter CM, Gorzynski JE, Fazal S, Bonner D, Ungar RA, Goddard PC, Raja A, Ashley EA, Bernstein JA, Zuchner S, Greicius MD, Montgomery SB, Schatz MC, Wheeler MT, Battle A. Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease. Genome Res 2025; 35:914-928. [PMID: 40113264 PMCID: PMC12047269 DOI: 10.1101/gr.279323.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 01/06/2025] [Indexed: 03/22/2025]
Abstract
Rare structural variants (SVs)-insertions, deletions, and complex rearrangements-can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore Technologies long-read genomes of 68 individuals from the undiagnosed disease network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4× increase from short reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably, these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that do not incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression toward improving the prioritization of functional SVs and TREs in rare disease patients.
Collapse
Affiliation(s)
- Tanner D Jensen
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Bohan Ni
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Chloe M Reuter
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - John E Gorzynski
- Department of Genetics, Stanford University, Stanford, California 94305, USA
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Sarah Fazal
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida 33136, USA
| | - Devon Bonner
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA
- Department of Pediatrics, Division of Medical Genetics, Stanford University School of Medicine, Stanford, California 94304, USA
| | - Rachel A Ungar
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Pagé C Goddard
- Department of Genetics, Stanford University, Stanford, California 94305, USA
| | - Archana Raja
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Euan A Ashley
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Jonathan A Bernstein
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA
- Department of Pediatrics, Stanford University School of Medicine, Stanford, California 94304, USA
| | - Stephan Zuchner
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida 33136, USA
| | - Michael D Greicius
- Department of Neurology and Neurological Sciences, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Stephen B Montgomery
- Department of Genetics, Stanford University, Stanford, California 94305, USA;
- Department of Pathology, Stanford University, Stanford, California 94305, USA
- Department of Biomedical Data Science, Stanford University, Stanford, California 94305, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| | - Matthew T Wheeler
- Center for Undiagnosed Diseases, Stanford University, Stanford, California 94305, USA;
- Department of Medicine, Division of Cardiovascular Medicine, Stanford University School of Medicine, Stanford, California 94305, USA
- GREGoR Stanford Site, Stanford University, Stanford, California 94305, USA
| | - Alexis Battle
- Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA;
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
- Department of Genetic Medicine, Johns Hopkins University, Baltimore, Maryland 21218, USA
| |
Collapse
|
2
|
Coulter A, Tong CY, Ni Y, Jiang Y. distQTL: Distribution Quantitative Trait Loci Identification by Population-Scale Single-Cell Data. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.04.04.647121. [PMID: 40291679 PMCID: PMC12026582 DOI: 10.1101/2025.04.04.647121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/30/2025]
Abstract
Mapping expression quantitative trait loci (eQTLs) is a powerful method to study how genetic variation influences gene expression. Traditional bulk eQTL methods rely on averaged gene expression across a possibly heterogeneous mixture of cells, which can obscure underlying regulatory heterogeneity. Single-cell eQTL methods circumvent the averaging artifacts, providing an immense opportunity to interrogate transcriptional regulation at a much finer resolution. Recent developments in metric space regression methods allow the use of full empirical distributions as response objects instead of simple summary statistics such as mean. Here, we leverage Fréchet regression to identify distribution QTLs (distQTLs) using population-scale single-cell RNA sequencing data. We apply distQTL to the OneK1K cohort, consisting of scRNA-seq data of peripheral blood mononuclear cells from 982 donors, and compare results to various eQTL approaches based on summary statistics and mixed effects modeling. We demonstrate the superior performance of distQTL across different gene expression contexts compared to other methods and benchmark our results against findings from the Genotype-Tissue Expression Project. Finally, we orthogonally validate calls from distQTL using cell-type-specific epigenomic profiles.
Collapse
|
3
|
Yildiz G, Zanini SF, Weber S, Kopalli V, Kox T, Abbadi A, Snowdon RJ, Golicz AA. Graphical pangenomics-enabled characterization of structural variant impact on gene expression in Brassica napus. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2025; 138:91. [PMID: 40178590 PMCID: PMC11968540 DOI: 10.1007/s00122-025-04867-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Accepted: 02/25/2025] [Indexed: 04/05/2025]
Abstract
KEY MESSAGE Pangenome graphs enable population-scale genotyping and improve expression analysis, revealing that structural variations (SVs), particularly transposable elements (TEs), significantly contribute to gene expression variation in winter oilseed rape. Structural variations (SVs) impact important traits, from yield to flowering behaviour and stress responses. Pangenome graphs capture population-level diversity, including SVs, within a single data structure and provide a robust framework for downstream applications. They have the potential to serve as unbiased references for SV genotyping, pan-transcriptomic analyses, and association studies, offering significant advantages over single reference genomes. However, their full potential for expression quantitative trait locus (eQTL) analysis is yet to be explored. We combined long and short-read whole genome sequencing data with expression profiling of Brassica napus (oilseed rape) to assess the impact of SVs on gene expression regulation and explored the utility of pangenome graphs for eQTL analysis. Over 90,000 SVs were discovered from 57 long-read datasets. Pangenome graph as reference was evaluated and used for SV genotyping with short reads and transcript expression quantification. Using SVs genotyped from the graph and 100 expression datasets, we identified 267 gene proximal (cis) SV-eQTLs. Over 70% of eQTL-SVs had similarity to transposable elements (TEs), especially Helitrons. The highest proportion of cis-eQTL-SVs were found in promoter regions. About a third of transcripts whose expression was associated with SVs, had no associated SNPs, suggesting that including SVs allows capturing of relationship which would be missed in SNP-only analyses. This study demonstrated that pangenome graphs provide a unifying framework for eQTL analysis by allowing population-scale SV genotyping and gene expression quantification. We also showed that SVs make an appreciable contribution to gene expression variation in winter oilseed rape.
Collapse
Affiliation(s)
- Gözde Yildiz
- Department of Agrobioinformatics, IFZ Research Center for Biosystems, Land Use and Nutrition, Justus Liebig University, Heinrich Buff Ring 26-32, 35392, Giessen, Germany
| | - Silvia F Zanini
- Department of Agrobioinformatics, IFZ Research Center for Biosystems, Land Use and Nutrition, Justus Liebig University, Heinrich Buff Ring 26-32, 35392, Giessen, Germany.
| | - Sven Weber
- Department of Plant Breeding, IFZ Research Center for Biosystems, Land Use and Nutrition, Justus Liebig University, Heinrich Buff Ring 26-32, 35392, Giessen, Germany
| | - Venkataramana Kopalli
- Department of Agrobioinformatics, IFZ Research Center for Biosystems, Land Use and Nutrition, Justus Liebig University, Heinrich Buff Ring 26-32, 35392, Giessen, Germany
| | - Tobias Kox
- NPZ Innovation GmbH, Hohenlieth-Hof, 24363, Holtsee, Germany
| | - Amine Abbadi
- NPZ Innovation GmbH, Hohenlieth-Hof, 24363, Holtsee, Germany
| | - Rod J Snowdon
- Department of Plant Breeding, IFZ Research Center for Biosystems, Land Use and Nutrition, Justus Liebig University, Heinrich Buff Ring 26-32, 35392, Giessen, Germany
| | - Agnieszka A Golicz
- Department of Agrobioinformatics, IFZ Research Center for Biosystems, Land Use and Nutrition, Justus Liebig University, Heinrich Buff Ring 26-32, 35392, Giessen, Germany.
| |
Collapse
|
4
|
Fang L, Teng J, Lin Q, Bai Z, Liu S, Guan D, Li B, Gao Y, Hou Y, Gong M, Pan Z, Yu Y, Clark EL, Smith J, Rawlik K, Xiang R, Chamberlain AJ, Goddard ME, Littlejohn M, Larson G, MacHugh DE, O'Grady JF, Sørensen P, Sahana G, Lund MS, Jiang Z, Pan X, Gong W, Zhang H, He X, Zhang Y, Gao N, He J, Yi G, Liu Y, Tang Z, Zhao P, Zhou Y, Fu L, Wang X, Hao D, Liu L, Chen S, Young RS, Shen X, Xia C, Cheng H, Ma L, Cole JB, Baldwin RL, Li CJ, Van Tassell CP, Rosen BD, Bhowmik N, Lunney J, Liu W, Guan L, Zhao X, Ibeagha-Awemu EM, Luo Y, Lin L, Canela-Xandri O, Derks MFL, Crooijmans RPMA, Gòdia M, Madsen O, Groenen MAM, Koltes JE, Tuggle CK, McCarthy FM, Rocha D, Giuffra E, Amills M, Clop A, Ballester M, Tosser-Klopp G, Li J, Fang C, Fang M, Wang Q, Hou Z, Wang Q, Zhao F, Jiang L, Zhao G, Zhou Z, Zhou R, Liu H, Deng J, Jin L, Li M, Mo D, Liu X, Chen Y, Yuan X, Li J, Zhao S, Zhang Y, Ding X, Sun D, et alFang L, Teng J, Lin Q, Bai Z, Liu S, Guan D, Li B, Gao Y, Hou Y, Gong M, Pan Z, Yu Y, Clark EL, Smith J, Rawlik K, Xiang R, Chamberlain AJ, Goddard ME, Littlejohn M, Larson G, MacHugh DE, O'Grady JF, Sørensen P, Sahana G, Lund MS, Jiang Z, Pan X, Gong W, Zhang H, He X, Zhang Y, Gao N, He J, Yi G, Liu Y, Tang Z, Zhao P, Zhou Y, Fu L, Wang X, Hao D, Liu L, Chen S, Young RS, Shen X, Xia C, Cheng H, Ma L, Cole JB, Baldwin RL, Li CJ, Van Tassell CP, Rosen BD, Bhowmik N, Lunney J, Liu W, Guan L, Zhao X, Ibeagha-Awemu EM, Luo Y, Lin L, Canela-Xandri O, Derks MFL, Crooijmans RPMA, Gòdia M, Madsen O, Groenen MAM, Koltes JE, Tuggle CK, McCarthy FM, Rocha D, Giuffra E, Amills M, Clop A, Ballester M, Tosser-Klopp G, Li J, Fang C, Fang M, Wang Q, Hou Z, Wang Q, Zhao F, Jiang L, Zhao G, Zhou Z, Zhou R, Liu H, Deng J, Jin L, Li M, Mo D, Liu X, Chen Y, Yuan X, Li J, Zhao S, Zhang Y, Ding X, Sun D, Sun HZ, Li C, Wang Y, Jiang Y, Wu D, Wang W, Fan X, Zhang Q, Li K, Zhang H, Yang N, Hu X, Huang W, Song J, Wu Y, Yang J, Wu W, Kasper C, Liu X, Yu X, Cui L, Zhou X, Kim S, Li W, Im HK, Buckler ES, Ren B, Schatz MC, Li JJ, Palmer AA, Frantz L, Zhou H, Zhang Z, Liu GE. The Farm Animal Genotype-Tissue Expression (FarmGTEx) Project. Nat Genet 2025; 57:786-796. [PMID: 40097783 DOI: 10.1038/s41588-025-02121-5] [Show More Authors] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2024] [Accepted: 02/06/2025] [Indexed: 03/19/2025]
Abstract
Genetic mutation and drift, coupled with natural and human-mediated selection and migration, have produced a wide variety of genotypes and phenotypes in farmed animals. We here introduce the Farm Animal Genotype-Tissue Expression (FarmGTEx) Project, which aims to elucidate the genetic determinants of gene expression across 16 terrestrial and aquatic domestic species under diverse biological and environmental contexts. For each species, we aim to collect multiomics data, particularly genomics and transcriptomics, from 50 tissues of 1,000 healthy adults and 200 additional animals representing a specific context. This Perspective provides an overview of the priorities of FarmGTEx and advocates for coordinated strategies of data analysis and resource-sharing initiatives. FarmGTEx aims to serve as a platform for investigating context-specific regulatory effects, which will deepen our understanding of molecular mechanisms underlying complex phenotypes. The knowledge and insights provided by FarmGTEx will contribute to improving sustainable agriculture-based food systems, comparative biology and eventual human biomedicine.
Collapse
Affiliation(s)
- Lingzhao Fang
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark.
| | - Jinyan Teng
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Qing Lin
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Zhonghao Bai
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Shuli Liu
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
- School of Life Sciences, Westlake University, Hangzhou, China
| | - Dailu Guan
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | - Bingjie Li
- Department of Animal and Veterinary Sciences, Scotland's Rural College, Midlothian, UK
| | - Yahui Gao
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Yali Hou
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Mian Gong
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zhangyuan Pan
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Ying Yu
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Emily L Clark
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, UK
| | - Jacqueline Smith
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, UK
| | - Konrad Rawlik
- Baillie Gifford Pandemic Science Hub, Centre for Inflammation Research, Institute for Regeneration and Repair, the University of Edinburgh, Edinburgh, UK
| | - Ruidong Xiang
- Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, Bundoora, Victoria, Australia
- Cambridge Baker Systems Genomics Initiative, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Agriculture, Food and Ecosystem Sciences, the University of Melbourne, Parkville, Victoria, Australia
| | - Amanda J Chamberlain
- Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, Bundoora, Victoria, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, Victoria, Australia
| | - Michael E Goddard
- Agriculture Victoria Research, AgriBio, Centre for AgriBioscience, Bundoora, Victoria, Australia
- School of Agriculture, Food and Ecosystem Sciences, the University of Melbourne, Parkville, Victoria, Australia
| | - Mathew Littlejohn
- Research and Development, Livestock Improvement Corporation, Hamilton, New Zealand
- AL Rae Centre for Genetics and Breeding, Massey University, Palmerston North, New Zealand
| | - Greger Larson
- The Palaeogenomics and Bio-Archaeology Research Network, School of Archaeology, University of Oxford, Oxford, UK
| | - David E MacHugh
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Dublin, Ireland
- UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Dublin, Ireland
- UCD One Health Centre, University College Dublin, Belfield, Dublin, Ireland
| | - John F O'Grady
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Dublin, Ireland
| | - Peter Sørensen
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Goutam Sahana
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Mogens Sandø Lund
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
| | - Zhihua Jiang
- Department of Animal Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA, USA
| | - Xiangchun Pan
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Wentao Gong
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Haihan Zhang
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Xi He
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Yuebo Zhang
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Ning Gao
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Jun He
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Guoqiang Yi
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Yuwen Liu
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Zhonglin Tang
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Pengju Zhao
- Hainan Institute, Zhejiang University, Yongyou Industry Park, Yazhou Bay Sci-Tech City, Sanya, China
| | - Yang Zhou
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of the Ministry of Education, Huazhong Agricultural University, Wuhan, China
- Yazhouwan National Laboratory, Sanya, China
| | - Liangliang Fu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of the Ministry of Education, Huazhong Agricultural University, Wuhan, China
| | - Xiao Wang
- Institute of Animal Science and Veterinary Medicine, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Dan Hao
- Poultry Institute, Shandong Academy of Agricultural Sciences, Jinan, China
| | - Lei Liu
- Yazhouwan National Laboratory, Sanya, China
| | - Siqian Chen
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Robert S Young
- Usher Institute, University of Edinburgh, Edinburgh, UK
- Zhejiang University-University of Edinburgh Institute, Zhejiang University, Haining, P. R. China
| | - Xia Shen
- Usher Institute, University of Edinburgh, Edinburgh, UK
- State Key Laboratory of Genetic Engineering, Center for Evolutionary Biology, School of Life Sciences, Fudan University, Shanghai, China
- Center for Intelligent Medicine Research, Greater Bay Area Institute of Precision Medicine (Guangzhou), Fudan University, Guangzhou, China
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | - Charley Xia
- Lothian Birth Cohort studies, University of Edinburgh, Edinburgh, UK
- Department of Psychology, University of Edinburgh, Edinburgh, UK
| | - Hao Cheng
- Department of Animal Science, University of California, Davis, Davis, CA, USA
| | - Li Ma
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD, USA
| | - John B Cole
- Council on Dairy Cattle Breeding, Bowie, MD, USA
- Department of Animal Sciences, Donald Henry Barron Reproductive and Perinatal Biology Research Program and the Genetics Institute, University of Florida, Gainesville, FL, USA
- Department of Animal Science, North Carolina State University, Raleigh, NC, USA
| | - Ransom L Baldwin
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA
| | - Cong-Jun Li
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA
| | - Curtis P Van Tassell
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA
| | - Benjamin D Rosen
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA
| | - Nayan Bhowmik
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA
| | - Joan Lunney
- Animal Parasitic Diseases Laboratory, BARC, NEA, ARS, USDA, Beltsville, MD, USA
| | - Wansheng Liu
- Department of Animal Science, Center for Reproductive Biology and Health, College of Agricultural Sciences, the Pennsylvania State University, University Park, PA, USA
| | - Leluo Guan
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, Alberta, Canada
- Faculty of Land and Food Systems, University of British Columbia, Vancouver, British Columbia, Canada
| | - Xin Zhao
- Department of Animal Science, McGill University, Sainte-Anne-de-Bellevue, Quebec, Canada
| | - Eveline M Ibeagha-Awemu
- Sherbrooke Research and Development Centre, Agriculture and Agri-Food Canada, Sherbrooke, Quebec, Canada
| | - Yonglun Luo
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Steno Diabetes Center Aarhus, Aarhus University Hospital, Aarhus, Denmark
| | - Lin Lin
- Department of Biomedicine, Aarhus University, Aarhus, Denmark
- Steno Diabetes Center Aarhus, Aarhus University Hospital, Aarhus, Denmark
| | - Oriol Canela-Xandri
- MRC Human Genetics Unit at the Institute of Genetics and Cancer, the University of Edinburgh, Edinburgh, UK
| | - Martijn F L Derks
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | | | - Marta Gòdia
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Ole Madsen
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - Martien A M Groenen
- Animal Breeding and Genomics, Wageningen University & Research, Wageningen, the Netherlands
| | - James E Koltes
- Department of Animal Science, Iowa State University, Ames, IA, USA
| | | | | | - Dominique Rocha
- GABI, AgroParisTech, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Elisabetta Giuffra
- GABI, AgroParisTech, INRAE, Paris-Saclay University, Jouy-en-Josas, France
| | - Marcel Amills
- Department of Animal Genetics, Centre for Research in Agricultural Genomics, CSIC-IRTA-UAB-UB, Campus de la Universitat Autònoma de Barcelona, Bellaterra, Spain
- Departament de Ciència Animal i dels Aliments, Universitat Autònoma de Barcelona, Bellaterra, Spain
| | - Alex Clop
- Department of Animal Genetics, Centre for Research in Agricultural Genomics, CSIC-IRTA-UAB-UB, Campus de la Universitat Autònoma de Barcelona, Bellaterra, Spain
- Consejo Superior de Investigaciones Científicas, Barcelona, Spain
| | - Maria Ballester
- Animal Breeding and Genetics Programme, Institut de Recerca i Tecnologia Agroalimentàries (IRTA), Torre Marimon, Caldes de Montbui, Spain
| | | | - Jing Li
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
- School of Agriculture and Life Sciences, Kunming University, Kunming, China
| | - Chao Fang
- LC-Bio Technologies, Co., Ltd, Hangzhou, China
| | - Ming Fang
- Key Laboratory of Healthy Mariculture for the East China Sea, Ministry of Agriculture and Rural Affairs, Jimei University, Xiamen, China
| | - Qishan Wang
- College of Animal Sciences, Zhejiang University, Hangzhou, China
| | - Zhuocheng Hou
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Qin Wang
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Fuping Zhao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Lin Jiang
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Guiping Zhao
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zhengkui Zhou
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Rong Zhou
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Hehe Liu
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Juan Deng
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Long Jin
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Mingzhou Li
- College of Animal Science and Technology, Sichuan Agricultural University, Chengdu, China
| | - Delin Mo
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Xiaohong Liu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Yaosheng Chen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Xiaolong Yuan
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Jiaqi Li
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China
| | - Shuhong Zhao
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of the Ministry of Education, Huazhong Agricultural University, Wuhan, China
- Yazhouwan National Laboratory, Sanya, China
| | - Yi Zhang
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Xiangdong Ding
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Dongxiao Sun
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Hui-Zeng Sun
- Key Laboratory of Dairy Cow Genetic Improvement and Milk Quality Research of Zhejiang Province, College of Animal Sciences, Zhejiang University, Hangzhou, China
| | - Cong Li
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Yu Wang
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Yu Jiang
- College of Animal Science and Technology, Northwest A&F University, Yangling, China
| | - Dongdong Wu
- Key Laboratory of Genetic Evolution and Animal Models, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, China
| | - Wenwen Wang
- Shandong Provincial Key Laboratory for Livestock Germplasm Innovation and Utilization, College of Animal Science, Shandong Agricultural University, Tai'an, China
| | - Xinzhong Fan
- Shandong Provincial Key Laboratory for Livestock Germplasm Innovation and Utilization, College of Animal Science, Shandong Agricultural University, Tai'an, China
| | - Qin Zhang
- Shandong Provincial Key Laboratory for Livestock Germplasm Innovation and Utilization, College of Animal Science, Shandong Agricultural University, Tai'an, China
| | - Kui Li
- Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Key Laboratory of Livestock and Poultry Multi-omics of MARA, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China
| | - Hao Zhang
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Ning Yang
- National Engineering Laboratory for Animal Breeding, State Key Laboratory of Animal Biotech Breeding, Key Laboratory of Animal Genetics, Breeding and Reproduction of the Ministry of Agriculture and Rural Affairs, College of Animal Science and Technology, China Agricultural University, Beijing, China
| | - Xiaoxiang Hu
- State Key Laboratory of Animal Biotech Breeding, College of Biological Sciences, China Agricultural University, Beijing, China
| | - Wen Huang
- Department of Animal Science, Michigan State University, East Lansing, MI, USA
| | - Jiuzhou Song
- Department of Animal and Avian Sciences, University of Maryland, College Park, MD, USA
| | - Yang Wu
- Institute of Rare Diseases, West China Hospital of Sichuan University, Chengdu, China
| | - Jian Yang
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, China
- School of Life Sciences, Westlake University, Hangzhou, China
| | - Weiwei Wu
- Institute of Animal Science, Xinjiang Academy of Animal Science, Ürümqi City, China
| | - Claudia Kasper
- Animal GenoPhenomics, Animal Production Systems and Animal Health, Agroscope Posieux, Fribourg, Switzerland
| | - Xinfeng Liu
- Center for Quantitative Genetics and Genomics, Aarhus University, Aarhus, Denmark
- State Key Laboratory of Herbage Improvement and Grassland Agro-ecosystem, College of Ecology, Lanzhou University, Lanzhou, China
| | - Xiaofei Yu
- College of Marine Life Sciences, Ocean University of China, Qingdao, China
| | - Leilei Cui
- School of Life Sciences, Nanchang University, Nanchang, China
- Jiangxi Province Key Laboratory of Aging and Disease, Human Aging Research Institute and School of Life Science, Nanchang University, Jiangxi, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Seyoung Kim
- Department of Epidemiology, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei Li
- Division of Computational Biomedicine, Department of Biological Chemistry, School of Medicine, University of California, Irvine, Irvine, CA, USA
| | - Hae Kyung Im
- Department of Medicine and Human Genetics, the University of Chicago, Chicago, IL, USA
| | - Edward S Buckler
- Section of Plant Breeding and Genetics, Cornell University, Ithaca, NY, USA
- Institute for Genomic Diversity, Cornell University, Ithaca, NY, USA
- Agricultural Research Service, United States Department of Agriculture, Ithaca, NY, USA
| | - Bing Ren
- Department of Cellular and Molecular Medicine, Center for Epigenomics, Moores Cancer Center and Institute of Genomic Medicine, University of California San Diego, School of Medicine, La Jolla, CA, USA
| | - Michael C Schatz
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Jingyi Jessica Li
- Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA, USA.
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA.
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA, USA.
| | - Laurent Frantz
- Palaeogenomics Group, Institute of Palaeoanatomy, Domestication Research and the History of Veterinary Medicine, Ludwig-Maximilians-Universität, Munich, Germany.
- School of Biological and Behavioural Sciences, Queen Mary University of London, London, UK.
| | - Huaijun Zhou
- Department of Animal Science, University of California, Davis, Davis, CA, USA.
| | - Zhe Zhang
- State Key Laboratory of Swine and Poultry Breeding Industry, National Engineering Research Center for Breeding Swine Industry, Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, College of Animal Science, South China Agricultural University, Guangzhou, China.
| | - George E Liu
- Animal Genomics and Improvement Laboratory, Henry A. Wallace Beltsville Agricultural Research Center, Agricultural Research Service, USDA, Beltsville, MD, USA.
| |
Collapse
|
5
|
O'Grady JF, McHugo GP, Ward JA, Hall TJ, Faherty O'Donnell SL, Correia CN, Browne JA, McDonald M, Gormley E, Riggio V, Prendergast JGD, Clark EL, Pausch H, Meade KG, Gormley IC, Gordon SV, MacHugh DE. Integrative genomics sheds light on the immunogenetics of tuberculosis in cattle. Commun Biol 2025; 8:479. [PMID: 40128580 PMCID: PMC11933339 DOI: 10.1038/s42003-025-07846-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Accepted: 02/27/2025] [Indexed: 03/26/2025] Open
Abstract
Mycobacterium bovis causes bovine tuberculosis (bTB), an infectious disease of cattle that represents a zoonotic threat to humans. Research has shown that the peripheral blood (PB) transcriptome is perturbed during bTB disease but the genomic architecture underpinning this transcriptional response remains poorly understood. Here, we analyse PB transcriptomics data from 63 control and 60 confirmed M. bovis-infected animals and detect 2592 differently expressed genes perturbing multiple immune response pathways. Leveraging imputed genome-wide SNP data, we characterise thousands of cis-expression quantitative trait loci (eQTLs) and show that the PB transcriptome is substantially impacted by intrapopulation genomic variation during M. bovis infection. Integrating our cis-eQTL data with bTB susceptibility GWAS summary statistics, we perform a transcriptome-wide association study and identify 115 functionally relevant genes (including RGS10, GBP4, TREML2, and RELT) and provide important new omics data for understanding the host response to mycobacterial infections that cause tuberculosis in mammals.
Collapse
Affiliation(s)
- John F O'Grady
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Ireland
| | - Gillian P McHugo
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Ireland
| | - James A Ward
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Ireland
| | - Thomas J Hall
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Ireland
| | - Sarah L Faherty O'Donnell
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Ireland
- Irish Blood Transfusion Service, National Blood Centre, James's Street, Dublin, Ireland
| | - Carolina N Correia
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Ireland
- Children's Health Ireland, 32 James's Walk, Rialto, Ireland
| | - John A Browne
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Ireland
| | - Michael McDonald
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Ireland
| | - Eamonn Gormley
- UCD School of Veterinary Medicine, University College Dublin, Belfield, Ireland
- UCD One Health Centre, University College Dublin, Belfield, Ireland
| | - Valentina Riggio
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, UK
- Centre for Tropical Livestock Genetics and Health (CTLGH), Roslin Institute, University of Edinburgh, Easter Bush Campus, Midlothian, UK
| | - James G D Prendergast
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, UK
- Centre for Tropical Livestock Genetics and Health (CTLGH), Roslin Institute, University of Edinburgh, Easter Bush Campus, Midlothian, UK
| | - Emily L Clark
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Midlothian, UK
- Centre for Tropical Livestock Genetics and Health (CTLGH), Roslin Institute, University of Edinburgh, Easter Bush Campus, Midlothian, UK
| | - Hubert Pausch
- Animal Genomics, ETH Zurich, Universitaetstrasse 2, Zurich, Switzerland
| | - Kieran G Meade
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Ireland
- UCD One Health Centre, University College Dublin, Belfield, Ireland
- UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Ireland
| | - Isobel C Gormley
- UCD School of Mathematics and Statistics, University College Dublin, Belfield, Ireland
| | - Stephen V Gordon
- UCD School of Veterinary Medicine, University College Dublin, Belfield, Ireland
- UCD One Health Centre, University College Dublin, Belfield, Ireland
- UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Ireland
| | - David E MacHugh
- UCD School of Agriculture and Food Science, University College Dublin, Belfield, Ireland.
- UCD One Health Centre, University College Dublin, Belfield, Ireland.
- UCD Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Belfield, Ireland.
| |
Collapse
|
6
|
Everson TM, Sehgal N, Campbell K, Barr DB, Panuwet P, Yakimavets V, Chen K, Perez C, Shankar K, Eick SM, Pearson KJ, Andres A. Placental PFAS concentrations are associated with perturbations of placental DNA methylation. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2025; 368:125737. [PMID: 39862910 DOI: 10.1016/j.envpol.2025.125737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 01/12/2025] [Accepted: 01/21/2025] [Indexed: 01/27/2025]
Abstract
The placenta is crucial for fetal development, is affected by PFAS toxicity, and evidence is accumulating that gestational PFAS perturb the epigenetic activity of the placenta. Gestational PFAS exposure can adversely affect offspring, yet individual and cumulative impacts of PFAS on the placental epigenome remain underexplored. Here, we conducted an epigenome-wide association study (EWAS) to examine the relationships between placental PFAS levels and DNA methylation in a cohort of mother-infant dyads in Arkansas (N = 151). We measured 17 PFAS in human placental tissues and quantified placental DNA methylation levels via the Illumina EPIC Microarray. We tested for differential DNA methylation with individual PFAS, and with mixtures of multiple PFAS. Our results demonstrated that numerous epigenetic loci were perturbed by PFAS, with PFHxS exhibiting the most abundant effects. Mixture analyses suggested cumulative effects of PFOA and PFOS, while PFHxS may act more independently. We additionally explored whether sex-specific effects may be present and concluded that future large studies should explicitly test for sex-specific effects. The genes that are annotated to our PFAS-associated epigenetic loci are primarily involved in growth processes and cardiometabolic health, while some genes are involved in neurodevelopment. These findings shed light on how prenatal PFAS exposures affect birth outcomes and children's health, emphasizing the importance of understanding PFAS mechanisms in the in-utero environment.
Collapse
Affiliation(s)
- Todd M Everson
- Gangarosa Department of Environmental Health, Emory University Rollins School of Public Health, Atlanta, GA, USA; Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA.
| | - Neha Sehgal
- Gangarosa Department of Environmental Health, Emory University Rollins School of Public Health, Atlanta, GA, USA
| | - Kyle Campbell
- Gangarosa Department of Environmental Health, Emory University Rollins School of Public Health, Atlanta, GA, USA
| | - Dana Boyd Barr
- Gangarosa Department of Environmental Health, Emory University Rollins School of Public Health, Atlanta, GA, USA
| | - Parinya Panuwet
- Gangarosa Department of Environmental Health, Emory University Rollins School of Public Health, Atlanta, GA, USA
| | - Volha Yakimavets
- Gangarosa Department of Environmental Health, Emory University Rollins School of Public Health, Atlanta, GA, USA
| | - Kelsey Chen
- Gangarosa Department of Environmental Health, Emory University Rollins School of Public Health, Atlanta, GA, USA
| | - Cynthia Perez
- Gangarosa Department of Environmental Health, Emory University Rollins School of Public Health, Atlanta, GA, USA
| | - Kartik Shankar
- USDA Agricultural Research Service, Responsive Agricultural Food Systems Research Unit, College Station, TX, USA
| | - Stephanie M Eick
- Gangarosa Department of Environmental Health, Emory University Rollins School of Public Health, Atlanta, GA, USA; Department of Epidemiology, Emory University Rollins School of Public Health, Atlanta, GA, USA
| | - Kevin J Pearson
- Department of Pharmacology & Nutritional Sciences, University of Kentucky College of Medicine, USA
| | - Aline Andres
- Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, AR, USA; Arkansas Children's Nutrition Center, Little Rock, AR, USA
| |
Collapse
|
7
|
Santhanam N, Sanchez-Roige S, Mi S, Liang Y, Chitre AS, Munro D, Chen D, Gao J, Garcia-Martinez A, George AM, Gileta AF, Han W, Holl K, Hughson A, King CP, Lamparelli AC, Martin CD, Nyasimi F, St. Pierre CL, Sumner S, Tripi J, Wang T, Chen H, Flagel S, Ishiwari K, Meyer P, Polesskaya O, Saba L, Solberg Woods LC, Palmer AA, Im HK. RatXcan: A framework for cross-species integration of genome-wide association and gene expression data. PLoS Genet 2025; 21:e1011583. [PMID: 40163524 PMCID: PMC12052193 DOI: 10.1371/journal.pgen.1011583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 05/05/2025] [Accepted: 01/20/2025] [Indexed: 04/02/2025] Open
Abstract
Genome-wide association studies (GWAS) have implicated specific alleles and genes as risk factors for numerous complex traits. However, translating GWAS results into biologically and therapeutically meaningful discoveries remains extremely challenging. Most GWAS results identify noncoding regions of the genome, suggesting that differences in gene regulation are the major driver of trait variability. To better integrate GWAS results with gene regulatory polymorphisms, we previously developed PrediXcan (also known as "transcriptome-wide association studies" or TWAS), which maps SNPs to predicted gene expression using GWAS data. In this study, we developed RatXcan, a framework that extends this methodology to outbred heterogeneous stock (HS) rats. RatXcan accounts for the close familial relationships among HS rats by modeling the relatedness with a random effect that encodes the genetic relatedness. RatXcan also corrects for polygenic-driven inflation because of the equivalence between a relatedness random effect and the infinitesimal polygenic model. To develop RatXcan, we trained transcript predictors for 8,934 genes using reference genotype and expression data from five rat brain regions. We found that the cis genetic architecture of gene expression in both rats and humans was sparse and similar across brain tissues. We tested the association between predicted expression in rats and two example traits (body length and BMI) using phenotype and genotype data from 5,401 densely genotyped HS rats and identified a significant enrichment between the genes associated with rat and human body length and BMI. Thus, RatXcan represents a valuable tool for identifying the relationship between gene expression and phenotypes across species and paves the way to explore shared biological mechanisms of complex traits.
Collapse
Affiliation(s)
- Natasha Santhanam
- Department of Medicine, Section of Genetic Medicine, The University of Chicago, Chicago, Illinois, United States of America
| | - Sandra Sanchez-Roige
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California, United States of America
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Sabrina Mi
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Yanyu Liang
- Department of Medicine, Section of Genetic Medicine, The University of Chicago, Chicago, Illinois, United States of America
| | - Apurva S. Chitre
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Daniel Munro
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Denghui Chen
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Jianjun Gao
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Angel Garcia-Martinez
- University of Tennessee Health Science Center, Department of Pharmacology, Addiction Science and Toxicology, Memphis, Tennessee, United States of America
| | - Anthony M. George
- University at Buffalo, Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, New York, United States of America
| | - Alexander F. Gileta
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Wenyan Han
- University of Tennessee Health Science Center, Department of Pharmacology, Addiction Science and Toxicology, Memphis, Tennessee, United States of America
| | - Katie Holl
- Medical College of Wisconsin, Department of Pediatrics, Milwaukee, Wisconsin, United States of America
| | - Alesa Hughson
- University of Michigan, Department of Psychiatry, Ann Arbor, Michigan, United States of America
| | - Christopher P. King
- University at Buffalo, Department of Psychology, Buffalo, New York, United States of America
| | - Alexander C. Lamparelli
- University at Buffalo, Department of Psychology, Buffalo, New York, United States of America
| | - Connor D. Martin
- University at Buffalo, Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, New York, United States of America
| | - Festus Nyasimi
- Department of Medicine, Section of Genetic Medicine, The University of Chicago, Chicago, Illinois, United States of America
| | - Celine L. St. Pierre
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Sarah Sumner
- Department of Medicine, Section of Genetic Medicine, The University of Chicago, Chicago, Illinois, United States of America
| | - Jordan Tripi
- University at Buffalo, Department of Psychology, Buffalo, New York, United States of America
| | - Tengfei Wang
- University of Tennessee Health Science Center, Department of Pharmacology, Addiction Science and Toxicology, Memphis, Tennessee, United States of America
| | - Hao Chen
- University of Tennessee Health Science Center, Department of Pharmacology, Addiction Science and Toxicology, Memphis, Tennessee, United States of America
| | - Shelly Flagel
- University of Michigan, Department of Psychiatry, Ann Arbor, Michigan, United States of America
| | - Keita Ishiwari
- University at Buffalo, Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, New York, United States of America
- University at Buffalo, Department of Pharmacology and Toxicology, Buffalo, New York, United States of America
| | - Paul Meyer
- University at Buffalo, Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, New York, United States of America
- University at Buffalo, Department of Psychology, Buffalo, New York, United States of America
| | - Oksana Polesskaya
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Laura Saba
- University of Colorado Anschutz Medical Campus, Department of Pharmaceutical Sciences, Aurora, Colorado, United States of America
| | - Leah C. Solberg Woods
- Wake Forest University School of Medicine, Department of Internal Medicine, Winston-Salem, North Carolina, United States of America
| | - Abraham A. Palmer
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
- Institute for Genomic Medicine, University of California San Diego, La Jolla, California, United States of America
| | - Hae Kyung Im
- Department of Medicine, Section of Genetic Medicine, The University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
8
|
Choi YA, Kim Y, Miao P, Lappalainen T, Gürsoy G. Secure and federated quantitative trait loci mapping with privateQTL. CELL GENOMICS 2025; 5:100769. [PMID: 39947138 PMCID: PMC11872535 DOI: 10.1016/j.xgen.2025.100769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 12/04/2024] [Accepted: 01/15/2025] [Indexed: 03/05/2025]
Abstract
Understanding the relationship between genotypes and phenotypes is crucial for advancing personalized medicine. Expression quantitative trait loci (eQTL) mapping plays a significant role by correlating genetic variants to gene expression levels. Despite the progress made by large-scale projects, eQTL mapping still faces challenges in statistical power and privacy concerns. Multi-site studies can increase sample sizes but are hindered by privacy issues. We present privateQTL, a novel framework leveraging secure multi-party computation for secure and federated eQTL mapping. When tested in a real-world scenario with data from different studies, privateQTL outperformed meta-analysis by accurately correcting for covariates and batch effect and retaining higher accuracy and precision for both eGene-eVariant mapping and effect size estimation. In addition, privateQTL is modular and scalable, making it adaptable for other molecular phenotypes and large-scale studies. Our results indicate that privateQTL is a practical solution for privacy-preserving collaborative eQTL mapping.
Collapse
Affiliation(s)
- Yoolim Annie Choi
- Columbia University, Department of Biomedical Informatics, New York, NY, USA; New York Genome Center, New York, NY, USA
| | - Yebin Kim
- New York Genome Center, New York, NY, USA
| | - Peihan Miao
- Brown University, Department of Computer Science, Providence, RI, USA
| | - Tuuli Lappalainen
- New York Genome Center, New York, NY, USA; Science for Life Laboratory, Department of Gene Technology, KTH Royal Institute of Technology, Solna, Sweden
| | - Gamze Gürsoy
- Columbia University, Department of Biomedical Informatics, New York, NY, USA; New York Genome Center, New York, NY, USA; Department of Computer Science, Columbia University, New York, NY, USA.
| |
Collapse
|
9
|
Torres-Rodríguez JV, Li D, Schnable JC. Evolving best practices for transcriptome-wide association studies accelerate discovery of gene-phenotype links. CURRENT OPINION IN PLANT BIOLOGY 2025; 83:102670. [PMID: 39626491 DOI: 10.1016/j.pbi.2024.102670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 10/20/2024] [Accepted: 11/01/2024] [Indexed: 02/01/2025]
Abstract
Transcriptome-wide association studies (TWAS) complement genome-wide association studies (GWAS) by using gene expression data to link specific genes to phenotypes. This review examines 37 TWAS studies across eight plant species, evaluating the impact of methodological choices on outcomes using maize and soybean datasets. Large sample sizes and synchronized sample collection for gene expression measurement appear to significantly increase power for discovering gene-phenotype linkages, while matching tissue, stage, and environment may matter much less than previously believed, making it feasible to reuse large and well-collected expression datasets across multiple studies. The development of statistical approaches and computational tools specifically optimized for plant TWAS data will ultimately be needed, but further potential remains to adapt advances developed in GWAS to TWAS contexts.
Collapse
Affiliation(s)
- J Vladimir Torres-Rodríguez
- Quantitative Life Sciences Initiative, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA; Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA; Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Delin Li
- Xianghu Laboratory, Hangzhou, 311231, China
| | - James C Schnable
- Quantitative Life Sciences Initiative, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA; Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA; Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.
| |
Collapse
|
10
|
Popp JM, Rhodes K, Jangi R, Li M, Barr K, Tayeb K, Battle A, Gilad Y. Cell type and dynamic state govern genetic regulation of gene expression in heterogeneous differentiating cultures. CELL GENOMICS 2024; 4:100701. [PMID: 39626676 DOI: 10.1016/j.xgen.2024.100701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/30/2024] [Revised: 09/18/2024] [Accepted: 11/05/2024] [Indexed: 12/11/2024]
Abstract
Identifying the molecular effects of human genetic variation across cellular contexts is crucial for understanding the mechanisms underlying disease-associated loci, yet many cell types and developmental stages remain underexplored. Here, we harnessed the potential of heterogeneous differentiating cultures (HDCs), an in vitro system in which pluripotent cells asynchronously differentiate into a broad spectrum of cell types. We generated HDCs for 53 human donors and collected single-cell RNA sequencing data from over 900,000 cells. We identified expression quantitative trait loci in 29 cell types and characterized regulatory dynamics across diverse differentiation trajectories. This revealed novel regulatory variants for genes involved in key developmental and disease-related processes while replicating known effects from primary tissues and dynamic regulatory effects associated with a range of complex traits.
Collapse
Affiliation(s)
- Joshua M Popp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Katherine Rhodes
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Radhika Jangi
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Mingyuan Li
- Department of Biology, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Kenneth Barr
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Karl Tayeb
- Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, IL 60637, USA
| | - Alexis Battle
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA; Department of Genetic Medicine, Johns Hopkins University, Baltimore, MD 21218, USA.
| | - Yoav Gilad
- Department of Medicine, University of Chicago, Chicago, IL 60637, USA; Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
11
|
Zhang S, Song Q, Zhang P, Wang X, Guo R, Li Y, Liu S, Yan X, Zhang J, Niu Y, Shi Y, Song T, Xu T, He S. Genome-wide investigation of VNTR motif polymorphisms in 8,222 genomes: Implications for biological regulation and human traits. CELL GENOMICS 2024; 4:100699. [PMID: 39609246 DOI: 10.1016/j.xgen.2024.100699] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 08/31/2024] [Accepted: 11/01/2024] [Indexed: 11/30/2024]
Abstract
Variable number tandem repeat (VNTR) is a pervasive and highly mutable genetic feature that varies in both length and repeat sequence. Despite the well-studied copy-number variants, the functional impacts of repeat motif polymorphisms remain unknown. Here, we present the largest genome-wide VNTR polymorphism map to date, with over 2.5 million VNTR length polymorphisms (VNTR-LPs) and over 11 million VNTR motif polymorphisms (VNTR-MPs) detected in 8,222 high-coverage genomes. Leveraging the large-scale NyuWa cohort, we identified 2,982,456 (31.8%) NyuWa-specific VNTR-MPs, of which 95.3% were rare. Moreover, we found 1,937 out of 38,685 VNTRs that were associated with gene expression through VNTR-MPs in lymphoblastoid cell lines. Specifically, we clarified that the expansion of a likely causal motif could upregulate gene expression by improving the binding concentration of PU.1. We also explored the potential impacts of VNTR polymorphisms on phenotypic differentiation and disease susceptibility. This study expands our knowledge of VNTR-MPs and their functional implications.
Collapse
Affiliation(s)
- Sijia Zhang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; Department of Scientific Research, Jiangsu Cancer Hospital & Jiangsu Institute of Cancer Research & Affiliated Cancer Hospital of Nanjing Medical University, Nanjing, China
| | - Qiao Song
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Peng Zhang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaona Wang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Rong Guo
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yanyan Li
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Shuai Liu
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaoyu Yan
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Jingjing Zhang
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yiwei Niu
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Yirong Shi
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tingrui Song
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China
| | - Tao Xu
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China; National Laboratory of Biomacromolecules, CAS Center for Excellence in Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; Shandong First Medical University & Shandong Academy of Medical Sciences, Jinan, Shandong 250117, China.
| | - Shunmin He
- Key Laboratory of Epigenetic Regulation and Intervention, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China; College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
12
|
Ko BS, Lee SB, Kim TK. A brief guide to analyzing expression quantitative trait loci. Mol Cells 2024; 47:100139. [PMID: 39447874 PMCID: PMC11600780 DOI: 10.1016/j.mocell.2024.100139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 10/14/2024] [Accepted: 10/17/2024] [Indexed: 10/26/2024] Open
Abstract
Molecular quantitative trait locus (molQTL) mapping has emerged as an important approach for elucidating the functional consequences of genetic variants and unraveling the causal mechanisms underlying diseases or complex traits. However, the variety of analysis tools and sophisticated methodologies available for molQTL studies can be overwhelming for researchers with limited computational expertise. Here, we provide a brief guideline with a curated list of methods and software tools for analyzing expression quantitative trait loci, the most widely studied type of molQTL.
Collapse
Affiliation(s)
- Byung Su Ko
- Department of Brain Sciences, DGIST, Daegu 42988, Republic of Korea
| | - Sung Bae Lee
- Department of Brain Sciences, DGIST, Daegu 42988, Republic of Korea
| | - Tae-Kyung Kim
- Department of Life Sciences, Pohang University of Science and Technology (POSTECH), Pohang 37673, Republic of Korea; Institute for Convergence Research and Education in Advanced Technology, Yonsei University, Seoul 03722, Republic of Korea.
| |
Collapse
|
13
|
Leblanc FJA, Jin X, Kang K, Lee CJM, Xu J, Xuan L, Ma W, Belhaj H, Benzaki M, Mehta N, Foo RSY, Reilly S, Anene-Nzelu CG, Pan Z, Nattel S, Yang B, Lettre G. Atrial fibrillation variant-to-gene prioritization through cross-ancestry eQTL and single-nucleus multiomic analyses. iScience 2024; 27:110660. [PMID: 39262787 PMCID: PMC11388022 DOI: 10.1016/j.isci.2024.110660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 03/28/2024] [Accepted: 07/31/2024] [Indexed: 09/13/2024] Open
Abstract
Atrial fibrillation (AF) is the most common arrhythmia in the world. Human genetics can provide strong AF therapeutic candidates, but the identification of the causal genes and their functions remains challenging. Here, we applied an AF fine-mapping strategy that leverages results from a previously published cross-ancestry genome-wide association study (GWAS), expression quantitative trait loci (eQTLs) from left atrial appendages (LAAs) obtained from two cohorts with distinct ancestry, and a paired RNA sequencing (RNA-seq) and ATAC sequencing (ATAC-seq) LAA single-nucleus assay (sn-multiome). At nine AF loci, our co-localization and fine-mapping analyses implicated 14 genes. Data integration identified several candidate causal AF variants, including rs7612445 at GNB4 and rs242557 at MAPT. Finally, we showed that the repression of the strongest AF-associated eQTL gene, LINC01629, in human embryonic stem cell-derived cardiomyocytes using CRISPR inhibition results in the dysregulation of pathways linked to genes involved in the development of atrial tissue and the cardiac conduction system.
Collapse
Affiliation(s)
- Francis J A Leblanc
- Montreal Heart Institute, Montreal, QC, Canada
- Department of Medicine, Université de Montréal, Montréal, QC, Canada
| | - Xuexin Jin
- Department of Pharmacology (State Key Laboratory of Frigid Zone Cardiovascular Disease, Key Laboratory of Cardiovascular Research, Ministry of Education), College of Pharmacy, Harbin Medical University, Harbin, Heilongjiang 150086, P.R. China
- Department of Cardiology, The First Affiliated Hospital, Harbin Medical University, Harbin 150001, China
| | - Kai Kang
- Department of Cardiovascular Surgery, The First Affiliated Hospital, Harbin Medical University, Harbin 150001, China
| | - Chang Jie Mick Lee
- Cardiovascular Disease Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Juan Xu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Lina Xuan
- Department of Pharmacology (State Key Laboratory of Frigid Zone Cardiovascular Disease, Key Laboratory of Cardiovascular Research, Ministry of Education), College of Pharmacy, Harbin Medical University, Harbin, Heilongjiang 150086, P.R. China
| | - Wenbo Ma
- Department of Pharmacology (State Key Laboratory of Frigid Zone Cardiovascular Disease, Key Laboratory of Cardiovascular Research, Ministry of Education), College of Pharmacy, Harbin Medical University, Harbin, Heilongjiang 150086, P.R. China
| | | | - Marouane Benzaki
- Montreal Heart Institute, Montreal, QC, Canada
- Department of Medicine, Université de Montréal, Montréal, QC, Canada
| | - Neelam Mehta
- Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UK
| | - Roger Sik Yin Foo
- Cardiovascular Disease Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Svetlana Reilly
- Division of Cardiovascular Medicine, Radcliffe Department of Medicine, University of Oxford, John Radcliffe Hospital, Oxford, UK
| | - Chukwuemeka George Anene-Nzelu
- Montreal Heart Institute, Montreal, QC, Canada
- Department of Medicine, Université de Montréal, Montréal, QC, Canada
- Cardiovascular Disease Translational Research Programme, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore
| | - Zhenwei Pan
- Department of Pharmacology (State Key Laboratory of Frigid Zone Cardiovascular Disease, Key Laboratory of Cardiovascular Research, Ministry of Education), College of Pharmacy, Harbin Medical University, Harbin, Heilongjiang 150086, P.R. China
| | - Stanley Nattel
- Montreal Heart Institute, Montreal, QC, Canada
- Department of Medicine, Université de Montréal, Montréal, QC, Canada
- Department of Pharmacology and Therapeutics, McGill University, Montreal, QC, Canada
- IHU Liryc and Fondation Bordeaux Université, Bordeaux, France
- Institute of Pharmacology, West German Heart and Vascular Center, Faculty of Medicine, University Duisburg-Essen, Essen, Germany
| | - Baofeng Yang
- Department of Pharmacology (State Key Laboratory of Frigid Zone Cardiovascular Disease, Key Laboratory of Cardiovascular Research, Ministry of Education), College of Pharmacy, Harbin Medical University, Harbin, Heilongjiang 150086, P.R. China
| | - Guillaume Lettre
- Montreal Heart Institute, Montreal, QC, Canada
- Department of Medicine, Université de Montréal, Montréal, QC, Canada
| |
Collapse
|
14
|
Chitneedi PK, Hadlich F, Moreira GCM, Espinosa-Carrasco J, Li C, Plastow G, Fischer D, Charlier C, Rocha D, Chamberlain AJ, Kuehn C. eQTL-Detect: nextflow-based pipeline for eQTL detection in modular format with sharable and parallelizable scripts. NAR Genom Bioinform 2024; 6:lqae122. [PMID: 39318506 PMCID: PMC11420669 DOI: 10.1093/nargab/lqae122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 07/26/2024] [Accepted: 09/02/2024] [Indexed: 09/26/2024] Open
Abstract
Bioinformatic pipelines are becoming increasingly complex with the ever-accumulating amount of Next-generation sequencing (NGS) data. Their orchestration is difficult with a simple Bash script, but bioinformatics workflow managers such as Nextflow provide a framework to overcome respective problems. This study used Nextflow to develop a bioinformatic pipeline for detecting expression quantitative trait loci (eQTL) using a DSL2 Nextflow modular syntax, to enable sharing the huge demand for computing power as well as data access limitation across different partners often associated with eQTL studies. Based on the results from a test run with pilot data by measuring the required runtime and computational resources, the new pipeline should be suitable for eQTL studies in large scale analyses.
Collapse
Affiliation(s)
| | - Frieder Hadlich
- Research Institute for Farm Animal Biology (FBN), Wilhelm-Stahl-Allee 2, 18196 Dummerstorf, Germany
| | - Gabriel C M Moreira
- Unit of Animal Genomics, GIGA Institute, University of Liège, 4000 Liège, Belgium
| | - Jose Espinosa-Carrasco
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr. Aiguader 88, Barcelona 08003, Spain
| | - Changxi Li
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton T6G 2P5, Canada
- Lacombe Research and Development Centre, Agriculture and Agri-Food Canada, T4L 1W1 Lacombe, Canada
| | - Graham Plastow
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton T6G 2P5, Canada
| | - Daniel Fischer
- Natural Resources Institute Finland (Luke), Green Technology, Animal and Plant Genomics and Breeding, FI-31600 Jokioinen, Finland
| | - Carole Charlier
- Unit of Animal Genomics, GIGA Institute, University of Liège, 4000 Liège, Belgium
| | - Dominique Rocha
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
| | - Amanda J Chamberlain
- Agriculture Victoria Research, AgriBio, Centre for AgriBiosciences, Bundoora, VIC 3083, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC 3083, Australia
| | - Christa Kuehn
- Research Institute for Farm Animal Biology (FBN), Wilhelm-Stahl-Allee 2, 18196 Dummerstorf, Germany
- Faculty of Agricultural and Environmental Science, University Rostock, Justus-von-Liebig-Weg 6, 18059 Rostock, Germany
- Friedrich-Loeffler-Institut (FLI), Federal Research Institute for Animal Health, 17493 Greifswald, Insel Riems, Germany
| |
Collapse
|
15
|
Yang Y, Chen Y, Xu S, Guo X, Jia G, Ping J, Shu X, Zhao T, Yuan F, Wang G, Xie Y, Ci H, Liu H, Qi Y, Liu Y, Liu D, Li W, Ye F, Shu XO, Zheng W, Li L, Cai Q, Long J. Integrating muti-omics data to identify tissue-specific DNA methylation biomarkers for cancer risk. Nat Commun 2024; 15:6071. [PMID: 39025880 PMCID: PMC11258330 DOI: 10.1038/s41467-024-50404-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 07/10/2024] [Indexed: 07/20/2024] Open
Abstract
The relationship between tissue-specific DNA methylation and cancer risk remains inadequately elucidated. Leveraging resources from the Genotype-Tissue Expression consortium, here we develop genetic models to predict DNA methylation at CpG sites across the genome for seven tissues and apply these models to genome-wide association study data of corresponding cancers, namely breast, colorectal, renal cell, lung, ovarian, prostate, and testicular germ cell cancers. At Bonferroni-corrected P < 0.05, we identify 4248 CpGs that are significantly associated with cancer risk, of which 95.4% (4052) are specific to a particular cancer type. Notably, 92 CpGs within 55 putative novel loci retain significant associations with cancer risk after conditioning on proximal signals identified by genome-wide association studies. Integrative multi-omics analyses reveal 854 CpG-gene-cancer trios, suggesting that DNA methylation at 309 distinct CpGs might influence cancer risk through regulating the expression of 205 unique cis-genes. These findings substantially advance our understanding of the interplay between genetics, epigenetics, and gene expression in cancer etiology.
Collapse
Affiliation(s)
- Yaohua Yang
- Center for Public Health Genomics, Department of Public Health Sciences, UVA Comprehensive Cancer Center, School of Medicine, University of Virginia, Charlottesville, VA, USA.
| | - Yaxin Chen
- Institute of Respiratory Health, Frontiers Science Center for Disease‑Related Molecular Network, State Key Laboratory of Respiratory Health and Multimorbidity, Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Shuai Xu
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Xingyi Guo
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Guochong Jia
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Jie Ping
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Xiang Shu
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Tianying Zhao
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Fangcheng Yuan
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Gang Wang
- Institute of Respiratory Health, Frontiers Science Center for Disease‑Related Molecular Network, State Key Laboratory of Respiratory Health and Multimorbidity, Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Yufang Xie
- Institute of Respiratory Health, Frontiers Science Center for Disease‑Related Molecular Network, State Key Laboratory of Respiratory Health and Multimorbidity, Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Hang Ci
- Institute of Respiratory Health, Frontiers Science Center for Disease‑Related Molecular Network, State Key Laboratory of Respiratory Health and Multimorbidity, Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Hongmo Liu
- Institute of Respiratory Health, Frontiers Science Center for Disease‑Related Molecular Network, State Key Laboratory of Respiratory Health and Multimorbidity, Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Yawen Qi
- Institute of Respiratory Health, Frontiers Science Center for Disease‑Related Molecular Network, State Key Laboratory of Respiratory Health and Multimorbidity, Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Yongjun Liu
- Department of Laboratory Medicine and Pathology, University of Washington Medical Center, Seattle, WA, USA
| | - Dan Liu
- Institute of Respiratory Health, Frontiers Science Center for Disease‑Related Molecular Network, State Key Laboratory of Respiratory Health and Multimorbidity, Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Weimin Li
- Institute of Respiratory Health, Frontiers Science Center for Disease‑Related Molecular Network, State Key Laboratory of Respiratory Health and Multimorbidity, Department of Respiratory and Critical Care Medicine, West China Hospital, Sichuan University, Chengdu, Sichuan, China
| | - Fei Ye
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Xiao-Ou Shu
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Wei Zheng
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Li Li
- Department of Family Medicine, UVA Comprehensive Cancer Center, School of Medicine, University of Virginia, Charlottesville, VA, USA
| | - Qiuyin Cai
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA.
| | - Jirong Long
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
16
|
Çelik MH, Gagneur J, Lim RG, Wu J, Thompson LM, Xie X. Identifying dysregulated regions in amyotrophic lateral sclerosis through chromatin accessibility outliers. HGG ADVANCES 2024; 5:100318. [PMID: 38872308 PMCID: PMC11260578 DOI: 10.1016/j.xhgg.2024.100318] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Revised: 06/10/2024] [Accepted: 06/11/2024] [Indexed: 06/15/2024] Open
Abstract
The high heritability of amyotrophic lateral sclerosis (ALS) contrasts with its low molecular diagnosis rate post-genetic testing, pointing to potential undiscovered genetic factors. To aid the exploration of these factors, we introduced EpiOut, an algorithm to identify chromatin accessibility outliers that are regions exhibiting divergent accessibility from the population baseline in a single or few samples. Annotation of accessible regions with histone chromatin immunoprecipitation sequencing and Hi-C indicates that outliers are concentrated in functional loci, especially among promoters interacting with active enhancers. Across different omics levels, outliers are robustly replicated, and chromatin accessibility outliers are reliable predictors of gene expression outliers and aberrant protein levels. When promoter accessibility does not align with gene expression, our results indicate that molecular aberrations are more likely to be linked to post-transcriptional regulation rather than transcriptional regulation. Our findings demonstrate that the outlier detection paradigm can uncover dysregulated regions in rare diseases. EpiOut is available at github.com/uci-cbcl/EpiOut.
Collapse
Affiliation(s)
- Muhammed Hasan Çelik
- Department of Computer Science, University of California Irvine, Irvine, CA, USA; Center for Complex Biological Systems, University of California Irvine, Irvine, CA, USA
| | - Julien Gagneur
- Department of Informatics, Technical University of Munich, Garching, Germany; Helmholtz Association - Munich School for Data Science (MUDS), Munich, Germany; Institute of Human Genetics, School of Medicine, Technical University of Munich, Munich, Germany; Institute of Computational Biology, Helmholtz Center Munich, Neuherberg, Germany
| | - Ryan G Lim
- Institute for Memory Impairments and Neurological Disorders, University of California Irvine, Irvine, CA 92697, USA
| | - Jie Wu
- Department of Biological Chemistry, University of California Irvine, Irvine, CA, USA
| | - Leslie M Thompson
- Institute for Memory Impairments and Neurological Disorders, University of California Irvine, Irvine, CA 92697, USA; Department of Biological Chemistry, University of California Irvine, Irvine, CA, USA; UCI MIND, University of California Irvine, Irvine, CA, USA; Department of Psychiatry and Human Behavior and Sue and Bill Gross Stem Cell Center, University of California Irvine, Irvine, CA, USA; Department of Neurobiology and Behavior, University of California Irvine, Irvine, CA, USA
| | - Xiaohui Xie
- Department of Computer Science, University of California Irvine, Irvine, CA, USA.
| |
Collapse
|
17
|
Brümmer A, Bergmann S. Disentangling genetic effects on transcriptional and post-transcriptional gene regulation through integrating exon and intron expression QTLs. Nat Commun 2024; 15:3786. [PMID: 38710690 DOI: 10.1038/s41467-024-48244-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Accepted: 04/23/2024] [Indexed: 05/08/2024] Open
Abstract
Expression quantitative trait loci (eQTL) studies typically consider exon expression of genes and discard intronic RNA sequencing reads despite their information on RNA metabolism. Here, we quantify genetic effects on exon and intron levels of genes and their ratio in lymphoblastoid cell lines, revealing thousands of cis-QTLs of each type. While genetic effects are often shared between cis-QTL types, 7814 (47%) are not detected as top cis-QTLs at exon levels. We show that exon levels preferentially capture genetic effects on transcriptional regulation, while exon-intron-ratios better detect those on co- and post-transcriptional processes. Considering all cis-QTL types substantially increases (by 71%) the number of colocalizing variants identified by genome-wide association studies (GWAS). It further allows dissecting the potential gene regulatory processes underlying GWAS associations, suggesting comparable contributions by transcriptional (50%) and co- and post-transcriptional regulation (46%) to complex traits. Overall, integrating intronic RNA sequencing reads in eQTL studies expands our understanding of genetic effects on gene regulatory processes.
Collapse
Affiliation(s)
- Anneke Brümmer
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- Bioinformatics Competence Center, University of Lausanne, Lausanne, Switzerland.
| | - Sven Bergmann
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland.
- Swiss Institute of Bioinformatics, Lausanne, Switzerland.
- Department of Integrative Biomedical Sciences, University of Cape Town, Cape Town, South Africa.
| |
Collapse
|
18
|
Popp JM, Rhodes K, Jangi R, Li M, Barr K, Tayeb K, Battle A, Gilad Y. Cell-type and dynamic state govern genetic regulation of gene expression in heterogeneous differentiating cultures. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.02.592174. [PMID: 38746382 PMCID: PMC11092595 DOI: 10.1101/2024.05.02.592174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Identifying the molecular effects of human genetic variation across cellular contexts is crucial for understanding the mechanisms underlying disease-associated loci, yet many cell-types and developmental stages remain underexplored. Here we harnessed the potential of heterogeneous differentiating cultures ( HDCs ), an in vitro system in which pluripotent cells asynchronously differentiate into a broad spectrum of cell-types. We generated HDCs for 53 human donors and collected single-cell RNA-sequencing data from over 900,000 cells. We identified expression quantitative trait loci in 29 cell-types and characterized regulatory dynamics across diverse differentiation trajectories. This revealed novel regulatory variants for genes involved in key developmental and disease-related processes while replicating known effects from primary tissues, and dynamic regulatory effects associated with a range of complex traits.
Collapse
|
19
|
Jeong R, Bulyk ML. Chromatin accessibility variation provides insights into missing regulation underlying immune-mediated diseases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.12.589213. [PMID: 38659802 PMCID: PMC11042205 DOI: 10.1101/2024.04.12.589213] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/26/2024]
Abstract
Most genetic loci associated with complex traits and diseases through genome-wide association studies (GWAS) are noncoding, suggesting that the causal variants likely have gene regulatory effects. However, only a small number of loci have been linked to expression quantitative trait loci (eQTLs) detected currently. To better understand the potential reasons for many trait-associated loci lacking eQTL colocalization, we investigated whether chromatin accessibility QTLs (caQTLs) in lymphoblastoid cell lines (LCLs) explain immune-mediated disease associations that eQTLs in LCLs did not. The power to detect caQTLs was greater than that of eQTLs and was less affected by the distance from the transcription start site of the associated gene. Meta-analyzing LCL eQTL data to increase the sample size to over a thousand led to additional loci with eQTL colocalization, demonstrating that insufficient statistical power is still likely to be a factor. Moreover, further eQTL colocalization loci were uncovered by surveying eQTLs of other immune cell types. Altogether, insufficient power and context-specificity of eQTLs both contribute to the 'missing regulation.'
Collapse
Affiliation(s)
- Raehoon Jeong
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
| | - Martha L. Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA 02138, USA
- Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
20
|
Rudra P, Zhou YH, Nobel A, Wright FA. Control of false discoveries in grouped hypothesis testing for eQTL data. BMC Bioinformatics 2024; 25:147. [PMID: 38605284 PMCID: PMC11007981 DOI: 10.1186/s12859-024-05736-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 03/08/2024] [Indexed: 04/13/2024] Open
Abstract
BACKGROUND Expression quantitative trait locus (eQTL) analysis aims to detect the genetic variants that influence the expression of one or more genes. Gene-level eQTL testing forms a natural grouped-hypothesis testing strategy with clear biological importance. Methods to control family-wise error rate or false discovery rate for group testing have been proposed earlier, but may not be powerful or easily apply to eQTL data, for which certain structured alternatives may be defensible and may enable the researcher to avoid overly conservative approaches. RESULTS In an empirical Bayesian setting, we propose a new method to control the false discovery rate (FDR) for grouped hypotheses. Here, each gene forms a group, with SNPs annotated to the gene corresponding to individual hypotheses. The heterogeneity of effect sizes in different groups is considered by the introduction of a random effects component. Our method, entitled Random Effects model and testing procedure for Group-level FDR control (REG-FDR), assumes a model for alternative hypotheses for the eQTL data and controls the FDR by adaptive thresholding. As a convenient alternate approach, we also propose Z-REG-FDR, an approximate version of REG-FDR, that uses only Z-statistics of association between genotype and expression for each gene-SNP pair. The performance of Z-REG-FDR is evaluated using both simulated and real data. Simulations demonstrate that Z-REG-FDR performs similarly to REG-FDR, but with much improved computational speed. CONCLUSION Our results demonstrate that the Z-REG-FDR method performs favorably compared to other methods in terms of statistical power and control of FDR. It can be of great practical use for grouped hypothesis testing for eQTL analysis or similar problems in statistical genomics due to its fast computation and ability to be fit using only summary data.
Collapse
Affiliation(s)
- Pratyaydipta Rudra
- Department of Statistics, Oklahoma State University, Stillwater, OK, USA.
| | - Yi-Hui Zhou
- Bioinformatics Research Center, Departments of Statistics and Biological Sciences, North Carolina State University, Raleigh, NC, USA
| | - Andrew Nobel
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA
| | - Fred A Wright
- Bioinformatics Research Center, Departments of Statistics and Biological Sciences, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
21
|
Hozumi Y, Tanemura KA, Wei GW. Preprocessing of Single Cell RNA Sequencing Data Using Correlated Clustering and Projection. J Chem Inf Model 2024; 64:2829-2838. [PMID: 37402705 PMCID: PMC11009150 DOI: 10.1021/acs.jcim.3c00674] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/06/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing the downstream analysis. We present Correlated Clustering and Projection (CCP), a new data-domain dimensionality reduction method, for the first time. CCP projects each cluster of similar genes into a supergene defined as the accumulated pairwise nonlinear gene-gene correlations among all cells. Using 14 benchmark data sets, we demonstrate that CCP has significant advantages over classical principal component analysis (PCA) for clustering and/or classification problems with intrinsically high dimensionality. In addition, we introduce the Residue-Similarity index (RSI) as a novel metric for clustering and classification and the R-S plot as a new visualization tool. We show that the RSI correlates with accuracy without requiring the knowledge of the true labels. The R-S plot provides a unique alternative to the uniform manifold approximation and projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE) for data with a large number of cell types.
Collapse
Affiliation(s)
- Yuta Hozumi
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Kiyoto Aramis Tanemura
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
22
|
Jensen TD, Ni B, Reuter CM, Gorzynski JE, Fazal S, Bonner D, Ungar RA, Goddard PC, Raja A, Ashley EA, Bernstein JA, Zuchner S, Greicius MD, Montgomery SB, Schatz MC, Wheeler MT, Battle A. Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.22.24304565. [PMID: 38585781 PMCID: PMC10996727 DOI: 10.1101/2024.03.22.24304565] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Rare structural variants (SVs) - insertions, deletions, and complex rearrangements - can cause Mendelian disease, yet they remain difficult to accurately detect and interpret. We sequenced and analyzed Oxford Nanopore long-read genomes of 68 individuals from the Undiagnosed Disease Network (UDN) with no previously identified diagnostic mutations from short-read sequencing. Using our optimized SV detection pipelines and 571 control long-read genomes, we detected 716 long-read rare (MAF < 0.01) SV alleles per genome on average, achieving a 2.4x increase from short-reads. To characterize the functional effects of rare SVs, we assessed their relationship with gene expression from blood or fibroblasts from the same individuals, and found that rare SVs overlapping enhancers were enriched (LOR = 0.46) near expression outliers. We also evaluated tandem repeat expansions (TREs) and found 14 rare TREs per genome; notably these TREs were also enriched near overexpression outliers. To prioritize candidate functional SVs, we developed Watershed-SV, a probabilistic model that integrates expression data with SV-specific genomic annotations, which significantly outperforms baseline models that don't incorporate expression data. Watershed-SV identified a median of eight high-confidence functional SVs per UDN genome. Notably, this included compound heterozygous deletions in FAM177A1 shared by two siblings, which were likely causal for a rare neurodevelopmental disorder. Our observations demonstrate the promise of integrating long-read sequencing with gene expression towards improving the prioritization of functional SVs and TREs in rare disease patients.
Collapse
|
23
|
Lin W, Wall JD, Li G, Newman D, Yang Y, Abney M, VandeBerg JL, Olivier M, Gilad Y, Cox LA. Genetic regulatory effects in response to a high-cholesterol, high-fat diet in baboons. CELL GENOMICS 2024; 4:100509. [PMID: 38430910 PMCID: PMC10943580 DOI: 10.1016/j.xgen.2024.100509] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2023] [Revised: 11/20/2023] [Accepted: 02/05/2024] [Indexed: 03/05/2024]
Abstract
Steady-state expression quantitative trait loci (eQTLs) explain only a fraction of disease-associated loci identified through genome-wide association studies (GWASs), while eQTLs involved in gene-by-environment (GxE) interactions have rarely been characterized in humans due to experimental challenges. Using a baboon model, we found hundreds of eQTLs that emerge in adipose, liver, and muscle after prolonged exposure to high dietary fat and cholesterol. Diet-responsive eQTLs exhibit genomic localization and genic features that are distinct from steady-state eQTLs. Furthermore, the human orthologs associated with diet-responsive eQTLs are enriched for GWAS genes associated with human metabolic traits, suggesting that context-responsive eQTLs with more complex regulatory effects are likely to explain GWAS hits that do not seem to overlap with standard eQTLs. Our results highlight the complexity of genetic regulatory effects and the potential of eQTLs with disease-relevant GxE interactions in enhancing the understanding of GWAS signals for human complex disease using non-human primate models.
Collapse
Affiliation(s)
- Wenhe Lin
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA.
| | - Jeffrey D Wall
- Institute for Human Genetics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Ge Li
- Center for Precision Medicine, Wake Forest University School of Medicine, Winston-Salem, NC 27157, USA
| | - Deborah Newman
- Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, TX 78229, USA
| | - Yunqi Yang
- Committee on Genetics, Genomics and System Biology, The University of Chicago, Chicago, IL 60637, USA
| | - Mark Abney
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA
| | - John L VandeBerg
- Department of Human Genetics, South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley, Brownsville, TX 78520, USA
| | - Michael Olivier
- Center for Precision Medicine, Wake Forest University School of Medicine, Winston-Salem, NC 27157, USA
| | - Yoav Gilad
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA; Department of Medicine, Section of Genetic Medicine, The University of Chicago, Chicago, IL 60637, USA.
| | - Laura A Cox
- Center for Precision Medicine, Wake Forest University School of Medicine, Winston-Salem, NC 27157, USA; Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, TX 78229, USA.
| |
Collapse
|
24
|
Johnson M, Chelysheva I, Öner D, McGinley J, Lin GL, O'Connor D, Robinson H, Drysdale SB, Gammin E, Vernon S, Muller J, Wolfenden H, Westcar S, Anguvaa L, Thwaites RS, Bont L, Wildenbeest J, Martinón-Torres F, Aerssens J, Openshaw PJM, Pollard AJ. A Genome-Wide Association Study of Respiratory Syncytial Virus Infection Severity in Infants. J Infect Dis 2024; 229:S112-S119. [PMID: 38271230 DOI: 10.1093/infdis/jiae029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 01/16/2024] [Accepted: 01/20/2024] [Indexed: 01/27/2024] Open
Abstract
BACKGROUND Respiratory syncytial virus (RSV) is a significant cause of infant morbidity and mortality worldwide. Most children experience at least one 1 RSV infection by the age of two 2 years, but not all develop severe disease. However, the understanding of genetic risk factors for severe RSV is incomplete. Consequently, we conducted a genome-wide association study of RSV severity. METHODS Disease severity was assessed by the ReSVinet scale, in a cohort of 251 infants aged 1 week to 1 year. Genotyping data were collected from multiple European study sites as part of the RESCEU Consortium. Linear regression models were used to assess the impact of genotype on RSV severity and gene expression as measured by microarray. RESULTS While no SNPs reached the genome-wide statistical significance threshold (P < 5 × 10-8), we identified 816 candidate SNPs with a P-value of <1 × 10-4. Functional annotation of candidate SNPs highlighted genes relevant to neutrophil trafficking and cytoskeletal functions, including LSP1 and RAB27A. Moreover, SNPs within the RAB27A locus significantly altered gene expression (false discovery rate, FDR P < .05). CONCLUSIONS These findings may provide insights into genetic mechanisms driving severe RSV infection, offering biologically relevant information for future investigations.
Collapse
Affiliation(s)
- Mari Johnson
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford
- NIHR Oxford Biomedical Research Centre and Oxford University Hospitals NHS Foundation Trust, United Kingdom
| | - Irina Chelysheva
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford
- NIHR Oxford Biomedical Research Centre and Oxford University Hospitals NHS Foundation Trust, United Kingdom
| | - Deniz Öner
- Biomarkers Infectious Diseases, Janssen Pharmaceutica NV, Beerse, Belgium
| | - Joseph McGinley
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford
- NIHR Oxford Biomedical Research Centre and Oxford University Hospitals NHS Foundation Trust, United Kingdom
| | - Gu-Lung Lin
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford
- NIHR Oxford Biomedical Research Centre and Oxford University Hospitals NHS Foundation Trust, United Kingdom
| | - Daniel O'Connor
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford
- NIHR Oxford Biomedical Research Centre and Oxford University Hospitals NHS Foundation Trust, United Kingdom
| | - Hannah Robinson
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford
- NIHR Oxford Biomedical Research Centre and Oxford University Hospitals NHS Foundation Trust, United Kingdom
| | - Simon B Drysdale
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford
- NIHR Oxford Biomedical Research Centre and Oxford University Hospitals NHS Foundation Trust, United Kingdom
| | - Emma Gammin
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford
- NIHR Oxford Biomedical Research Centre and Oxford University Hospitals NHS Foundation Trust, United Kingdom
| | - Sophie Vernon
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford
- NIHR Oxford Biomedical Research Centre and Oxford University Hospitals NHS Foundation Trust, United Kingdom
| | - Jill Muller
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford
- NIHR Oxford Biomedical Research Centre and Oxford University Hospitals NHS Foundation Trust, United Kingdom
| | | | | | | | - Ryan S Thwaites
- National Heart and Lung Institute, Imperial College London, United Kingdom
| | - Louis Bont
- Department of Paediatric Infectious Diseases and Immunology, Wilhelmina Children's Hospital, University Medical Centre Utrecht, Netherlands
| | - Joanne Wildenbeest
- Department of Paediatric Infectious Diseases and Immunology, Wilhelmina Children's Hospital, University Medical Centre Utrecht, Netherlands
| | - Federico Martinón-Torres
- Translational Pediatrics and Infectious Diseases, Department of Pediatrics, Hospital Clínico Universitario de Santiago de Compostela
- Genetics, Vaccines and Infections Research Group, Instituto de Investigación Sanitaria de Santiago, Universidade de Santiago de Compostela
- Centro de Investigación Biomédica en Red de Enfermedades Respiratorias, Instituto de Salud Carlos III, Madrid, Spain
| | - Jeroen Aerssens
- Biomarkers Infectious Diseases, Janssen Pharmaceutica NV, Beerse, Belgium
| | - Peter J M Openshaw
- National Heart and Lung Institute, Imperial College London, United Kingdom
| | - Andrew J Pollard
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford
- NIHR Oxford Biomedical Research Centre and Oxford University Hospitals NHS Foundation Trust, United Kingdom
| |
Collapse
|
25
|
Feng H, Cottrell S, Hozumi Y, Wei GW. Multiscale differential geometry learning of networks with applications to single-cell RNA sequencing data. Comput Biol Med 2024; 171:108211. [PMID: 38422960 PMCID: PMC10965033 DOI: 10.1016/j.compbiomed.2024.108211] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2024] [Revised: 02/02/2024] [Accepted: 02/25/2024] [Indexed: 03/02/2024]
Abstract
Single-cell RNA sequencing (scRNA-seq) has emerged as a transformative technology, offering unparalleled insights into the intricate landscape of cellular diversity and gene expression dynamics. scRNA-seq analysis represents a challenging and cutting-edge frontier within the field of biological research. Differential geometry serves as a powerful mathematical tool in various applications of scientific research. In this study, we introduce, for the first time, a multiscale differential geometry (MDG) strategy for addressing the challenges encountered in scRNA-seq data analysis. We assume that intrinsic properties of cells lie on a family of low-dimensional manifolds embedded in the high-dimensional space of scRNA-seq data. Multiscale cell-cell interactive manifolds are constructed to reveal complex relationships in the cell-cell network, where curvature-based features for cells can decipher the intricate structural and biological information. We showcase the utility of our novel approach by demonstrating its effectiveness in classifying cell types. This innovative application of differential geometry in scRNA-seq analysis opens new avenues for understanding the intricacies of biological networks and holds great potential for network analysis in other fields.
Collapse
Affiliation(s)
- Hongsong Feng
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Sean Cottrell
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Yuta Hozumi
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, MI 48824, USA; Department of Electrical and Computer Engineering, Michigan State University, East Lansing, MI 48824, USA; Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI 48824, USA.
| |
Collapse
|
26
|
Casazza W, Inkster AM, Del Gobbo GF, Yuan V, Delahaye F, Marsit C, Park YP, Robinson WP, Mostafavi S, Dennis JK. Sex-dependent placental methylation quantitative trait loci provide insight into the prenatal origins of childhood onset traits and conditions. iScience 2024; 27:109047. [PMID: 38357671 PMCID: PMC10865402 DOI: 10.1016/j.isci.2024.109047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 06/19/2023] [Accepted: 01/23/2024] [Indexed: 02/16/2024] Open
Abstract
Molecular quantitative trait loci (QTLs) allow us to understand the biology captured in genome-wide association studies (GWASs). The placenta regulates fetal development and shows sex differences in DNA methylation. We therefore hypothesized that placental methylation QTL (mQTL) explain variation in genetic risk for childhood onset traits, and that effects differ by sex. We analyzed 411 term placentas from two studies and found 49,252 methylation (CpG) sites with mQTL and 2,489 CpG sites with sex-dependent mQTL. All mQTL were enriched in regions that typically affect gene expression in prenatal tissues. All mQTL were also enriched in GWAS results for growth- and immune-related traits, but male- and female-specific mQTL were more enriched than cross-sex mQTL. mQTL colocalized with trait loci at 777 CpG sites, with 216 (28%) specific to males or females. Overall, mQTL specific to male and female placenta capture otherwise overlooked variation in childhood traits.
Collapse
Affiliation(s)
- William Casazza
- Centre for Molecular Medicine and Therapeutics, BC Children’s Hospital, Vancouver, BC, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada
- BC Children’s Hospital Research Institute, Vancouver, BC, Canada
| | - Amy M. Inkster
- BC Children’s Hospital Research Institute, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Giulia F. Del Gobbo
- BC Children’s Hospital Research Institute, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
- Children’s Hospital of Eastern Ontario Research Institute, University of Ottawa, Ottawa, ON, Canada
| | - Victor Yuan
- BC Children’s Hospital Research Institute, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | | | - Carmen Marsit
- Rollins School of Public Health, Emory University, Atlanta, GA, USA
| | - Yongjin P. Park
- Department of Statistics, University of British Columbia, Vancouver, BC, Canada
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Wendy P. Robinson
- BC Children’s Hospital Research Institute, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| | - Sara Mostafavi
- Centre for Molecular Medicine and Therapeutics, BC Children’s Hospital, Vancouver, BC, Canada
- Paul Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA
| | - Jessica K. Dennis
- Centre for Molecular Medicine and Therapeutics, BC Children’s Hospital, Vancouver, BC, Canada
- Bioinformatics Graduate Program, University of British Columbia, Vancouver, BC, Canada
- BC Children’s Hospital Research Institute, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
27
|
Vochteloo M, Deelen P, Vink B, Tsai EA, Runz H, Andreu-Sánchez S, Fu J, Zhernakova A, Westra HJ, Franke L. PICALO: principal interaction component analysis for the identification of discrete technical, cell-type, and environmental factors that mediate eQTLs. Genome Biol 2024; 25:29. [PMID: 38254182 PMCID: PMC10802033 DOI: 10.1186/s13059-023-03151-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 12/20/2023] [Indexed: 01/24/2024] Open
Abstract
Expression quantitative trait loci (eQTL) offer insights into the regulatory mechanisms of trait-associated variants, but their effects often rely on contexts that are unknown or unmeasured. We introduce PICALO, a method for hidden variable inference of eQTL contexts. PICALO identifies and disentangles technical from biological context in heterogeneous blood and brain bulk eQTL datasets. These contexts are biologically informative and reproducible, outperforming cell counts or expression-based principal components. Furthermore, we show that RNA quality and cell type proportions interact with thousands of eQTLs. Knowledge of hidden eQTL contexts may aid in the inference of functional mechanisms underlying disease variants.
Collapse
Affiliation(s)
- Martijn Vochteloo
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Patrick Deelen
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Oncode Institute, Utrecht, The Netherlands
| | - Britt Vink
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Institute for Life Science & Technology, Hanze University of Applied Sciences, Groningen, The Netherlands
| | - Ellen A Tsai
- Translational Sciences, Research and Development, Biogen, Cambridge, MA, USA
| | - Heiko Runz
- Translational Sciences, Research and Development, Biogen, Cambridge, MA, USA
| | - Sergio Andreu-Sánchez
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Department of Pediatrics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Jingyuan Fu
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
- Department of Pediatrics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Alexandra Zhernakova
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands
| | - Harm-Jan Westra
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
- Oncode Institute, Utrecht, The Netherlands.
| | - Lude Franke
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
- Oncode Institute, Utrecht, The Netherlands.
| |
Collapse
|
28
|
Tan P, Miles CE. Intrinsic statistical separation of subpopulations in heterogeneous collective motion via dimensionality reduction. Phys Rev E 2024; 109:014403. [PMID: 38366514 DOI: 10.1103/physreve.109.014403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 12/12/2023] [Indexed: 02/18/2024]
Abstract
Collective motion of locally interacting agents is found ubiquitously throughout nature. The inability to probe individuals has driven longstanding interest in the development of methods for inferring the underlying interactions. In the context of heterogeneous collectives, where the population consists of individuals driven by different interactions, existing approaches require some knowledge about the heterogeneities or underlying interactions. Here, we investigate the feasibility of identifying the identities in a heterogeneous collective without such prior knowledge. We numerically explore the behavior of a heterogeneous Vicsek model and find sufficiently long trajectories intrinsically cluster in a principal component analysis-based dimensionally reduced model-agnostic description of the data. We identify how heterogeneities in each parameter in the model (interaction radius, noise, population proportions) dictate this clustering. Finally, we show the generality of this phenomenon by finding similar behavior in a heterogeneous D'Orsogna model. Altogether, our results establish and quantify the intrinsic model-agnostic statistical disentanglement of identities in heterogeneous collectives.
Collapse
Affiliation(s)
- Pei Tan
- Mathematical, Computational, and Systems Biology Graduate Program, University of California, Irvine 92697, USA
| | | |
Collapse
|
29
|
Lin W, Wall JD, Li G, Newman D, Yang Y, Abney M, VandeBerg JL, Olivier M, Gilad Y, Cox LA. Genetic regulatory effects in response to a high cholesterol, high fat diet in baboons. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.01.551489. [PMID: 37577666 PMCID: PMC10418186 DOI: 10.1101/2023.08.01.551489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Steady-state expression quantitative trait loci (eQTLs) explain only a fraction of disease-associated loci identified through genome-wide association studies (GWAS), while eQTLs involved in gene-by-environment (GxE) interactions have rarely been characterized in humans due to experimental challenges. Using a baboon model, we found hundreds of eQTLs that emerge in adipose, liver, and muscle after prolonged exposure to high dietary fat and cholesterol. Diet-responsive eQTLs exhibit genomic localization and genic features that are distinct from steady-state eQTLs. Furthermore, the human orthologs associated with diet-responsive eQTLs are enriched for GWAS genes associated with human metabolic traits, suggesting that context-responsive eQTLs with more complex regulatory effects are likely to explain GWAS hits that do not seem to overlap with standard eQTLs. Our results highlight the complexity of genetic regulatory effects and the potential of eQTLs with disease-relevant GxE interactions in enhancing the understanding of GWAS signals for human complex disease using nonhuman primate models.
Collapse
Affiliation(s)
- Wenhe Lin
- Department of Human Genetics, The University of Chicago, Chicago, USA
| | - Jeffrey D. Wall
- Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
- Present address: Galatea Bio, Hialeah, FL, USA
| | - Ge Li
- Center for Precision Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Deborah Newman
- Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, TX, USA
| | - Yunqi Yang
- Committee on Genetics, Genomics and System Biology, The University of Chicago, Chicago, USA
| | - Mark Abney
- Department of Human Genetics, The University of Chicago, Chicago, USA
| | - John L. VandeBerg
- Department of Human Genetics, South Texas Diabetes and Obesity Institute, University of Texas Rio Grand Valley, Brownsville, TX, USA
| | - Michael Olivier
- Center for Precision Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Yoav Gilad
- Department of Human Genetics, The University of Chicago, Chicago, USA
- Department of Medicine, Section of Genetic Medicine, The University of Chicago, Chicago, IL, USA
- Lead contact
| | - Laura A. Cox
- Center for Precision Medicine, Wake Forest University School of Medicine, Winston-Salem, NC, USA
- Southwest National Primate Research Center, Texas Biomedical Research Institute, San Antonio, TX, USA
| |
Collapse
|
30
|
Zhou HJ, Ge X, Li JJ. ClipperQTL: ultrafast and powerful eGene identification method. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.28.555191. [PMID: 37693523 PMCID: PMC10491229 DOI: 10.1101/2023.08.28.555191] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/12/2023]
Abstract
A central task in expression quantitative trait locus (eQTL) analysis is to identify cis-eGenes (henceforth "eGenes"), i.e., genes whose expression levels are regulated by at least one local genetic variant. Among the existing eGene identification methods, FastQTL is considered the gold standard but is computationally expensive as it requires thousands of permutations for each gene. Alternative methods such as eigenMT and TreeQTL have lower power than FastQTL. In this work, we propose ClipperQTL, which reduces the number of permutations needed from thousands to 20 for data sets with large sample sizes (> 450) by using the contrastive strategy developed in Clipper; for data sets with smaller sample sizes, it uses the same permutation-based approach as FastQTL. We show that ClipperQTL performs as well as FastQTL and runs about 500 times faster if the contrastive strategy is used and 50 times faster if the conventional permutation-based approach is used. The R package ClipperQTL is available at https://github.com/heatherjzhou/ClipperQTL.
Collapse
Affiliation(s)
- Heather J. Zhou
- Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Xinzhou Ge
- Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Current address: Department of Statistics, Oregon State University, Corvallis, OR 97330, USA
| | - Jingyi Jessica Li
- Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Computational Medicine, University of California, Los Angeles, Los Angeles, CA 90095, USA
- Department of Biostatistics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
31
|
Cuomo ASE, Nathan A, Raychaudhuri S, MacArthur DG, Powell JE. Single-cell genomics meets human genetics. Nat Rev Genet 2023; 24:535-549. [PMID: 37085594 PMCID: PMC10784789 DOI: 10.1038/s41576-023-00599-5] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/29/2023] [Indexed: 04/23/2023]
Abstract
Single-cell genomic technologies are revealing the cellular composition, identities and states in tissues at unprecedented resolution. They have now scaled to the point that it is possible to query samples at the population level, across thousands of individuals. Combining single-cell information with genotype data at this scale provides opportunities to link genetic variation to the cellular processes underpinning key aspects of human biology and disease. This strategy has potential implications for disease diagnosis, risk prediction and development of therapeutic solutions. But, effectively integrating large-scale single-cell genomic data, genetic variation and additional phenotypic data will require advances in data generation and analysis methods. As single-cell genetics begins to emerge as a field in its own right, we review its current state and the challenges and opportunities ahead.
Collapse
Affiliation(s)
- Anna S E Cuomo
- Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia.
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia.
| | - Aparna Nathan
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Divisions of Rheumatology and Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Soumya Raychaudhuri
- Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Divisions of Rheumatology and Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Daniel G MacArthur
- Centre for Population Genomics, Garvan Institute of Medical Research, Sydney, New South Wales, Australia
- Centre for Population Genomics, Murdoch Children's Research Institute, Melbourne, Victoria, Australia
| | - Joseph E Powell
- Garvan Institute of Medical Research, Darlinghurst, Sydney, New South Wales, Australia.
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, New South Wales, Australia.
| |
Collapse
|
32
|
Xue A, Yazar S, Neavin D, Powell JE. Pitfalls and opportunities for applying latent variables in single-cell eQTL analyses. Genome Biol 2023; 24:33. [PMID: 36823676 PMCID: PMC9948363 DOI: 10.1186/s13059-023-02873-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 02/13/2023] [Indexed: 02/25/2023] Open
Abstract
Using latent variables in gene expression data can help correct unobserved confounders and increase statistical power for expression quantitative trait Loci (eQTL) detection. The probabilistic estimation of expression residuals (PEER) and principal component analysis (PCA) are widely used methods that can remove unwanted variation and improve eQTL discovery power in bulk RNA-seq analysis. However, their performance has not been evaluated extensively in single-cell eQTL analysis, especially for different cell types. Potential challenges arise due to the structure of single-cell RNA-seq data, including sparsity, skewness, and mean-variance relationship. Here, we show by a series of analyses that PEER and PCA require additional quality control and data transformation steps on the pseudo-bulk matrix to obtain valid latent variables; otherwise, it can result in highly correlated factors (Pearson's correlation r = 0.63 ~ 0.99). Incorporating valid PFs/PCs in the eQTL association model would identify 1.7 ~ 13.3% more eGenes. Sensitivity analysis showed that the pattern of change between the number of eGenes detected and fitted PFs/PCs varied significantly in different cell types. In addition, using highly variable genes to generate latent variables could achieve similar eGenes discovery power as using all genes but save considerable computational resources (~ 6.2-fold faster).
Collapse
Affiliation(s)
- Angli Xue
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, NSW, 2010, Australia.
- School of Biomedical Sciences, University of New South Wales, Sydney, NSW, 2052, Australia.
| | - Seyhan Yazar
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, NSW, 2010, Australia
| | - Drew Neavin
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, NSW, 2010, Australia
| | - Joseph E Powell
- Garvan-Weizmann Centre for Cellular Genomics, Garvan Institute of Medical Research, Sydney, NSW, 2010, Australia.
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney, NSW, 2052, Australia.
| |
Collapse
|