1
|
Mudappathi R, Patton T, Chen H, Yang P, Sun Z, Wang P, Shi CX, Wang J, Liu L. reg-eQTL: Integrating transcription factor effects to unveil regulatory variants. Am J Hum Genet 2025; 112:659-674. [PMID: 39922197 PMCID: PMC11947170 DOI: 10.1016/j.ajhg.2025.01.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Revised: 01/09/2025] [Accepted: 01/15/2025] [Indexed: 02/10/2025] Open
Abstract
Regulatory single-nucleotide variants (rSNVs) in noncoding regions of the genome play a crucial role in gene transcription by altering transcription factor (TF) binding, chromatin states, and other epigenetic modifications. Existing expression quantitative trait locus (eQTL) methods identify genomic loci associated with gene-expression changes, but they often fall short in pinpointing causal variants. We introduce reg-eQTL, a computational method that incorporates TF effects and interactions with genetic variants into eQTL analysis. This approach provides deeper insights into the regulatory mechanisms, bringing us one step closer to identifying potential causal variants by uncovering how TFs interact with SNVs to influence gene expression. This method defines a trio consisting of a genetic variant, a target gene, and a TF and tests its impact on gene transcription. In comprehensive simulations, reg-eQTL shows improved power of detecting rSNVs with low population frequency, weak effects, and synergetic interaction with TF as compared to traditional eQTL methods. Application of reg-eQTL to GTEx data from lung, brain, and whole-blood tissues uncovered regulatory trios that include eQTLs and increased the number of eQTLs shared across tissue types. Regulatory networks constructed on the basis of these trios reveal intricate gene regulation across tissue types.
Collapse
Affiliation(s)
- Rekha Mudappathi
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA; Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA; Division of Epidemiology, Department of Quantitative Health Sciences, Mayo Clinic, Scottsdale, AZ 85259, USA
| | - Tatiana Patton
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA; Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA
| | - Hai Chen
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA; Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA; Division of Epidemiology, Department of Quantitative Health Sciences, Mayo Clinic, Scottsdale, AZ 85259, USA
| | - Ping Yang
- Division of Epidemiology, Department of Quantitative Health Sciences, Mayo Clinic, Scottsdale, AZ 85259, USA
| | - Zhifu Sun
- Division of Computational Biology, Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN 55905, USA
| | - Panwen Wang
- Department of Quantitative Health Sciences and Center for Individualized Medicine, Mayo Clinic, Scottsdale, AZ, USA
| | - Chang-Xin Shi
- Division of Hematology/Oncology, Department of Medicine, Mayo Clinic, Scottsdale, AZ 85259, USA
| | - Junwen Wang
- Department of Quantitative Health Sciences and Center for Individualized Medicine, Mayo Clinic, Scottsdale, AZ, USA; Division of Applied Oral Sciences & Community Dental Care, Faculty of Dentistry, The University of Hong Kong, 34 Hospital Road, Hong Kong SAR, China
| | - Li Liu
- College of Health Solutions, Arizona State University, Phoenix, AZ 85004, USA; Biodesign Institute, Arizona State University, Tempe, AZ 85281, USA.
| |
Collapse
|
2
|
Wang Y, Sun F, Lin W, Zhang S. AC-PCoA: Adjustment for confounding factors using principal coordinate analysis. PLoS Comput Biol 2022; 18:e1010184. [PMID: 35830390 PMCID: PMC9278763 DOI: 10.1371/journal.pcbi.1010184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 05/08/2022] [Indexed: 12/01/2022] Open
Abstract
Confounding factors exist widely in various biological data owing to technical variations, population structures and experimental conditions. Such factors may mask the true signals and lead to spurious associations in the respective biological data, making it necessary to adjust confounding factors accordingly. However, existing confounder correction methods were mainly developed based on the original data or the pairwise Euclidean distance, either one of which is inadequate for analyzing different types of data, such as sequencing data. In this work, we proposed a method called Adjustment for Confounding factors using Principal Coordinate Analysis, or AC-PCoA, which reduces data dimension and extracts the information from different distance measures using principal coordinate analysis, and adjusts confounding factors across multiple datasets by minimizing the associations between lower-dimensional representations and confounding variables. Application of the proposed method was further extended to classification and prediction. We demonstrated the efficacy of AC-PCoA on three simulated datasets and five real datasets. Compared to the existing methods, AC-PCoA shows better results in visualization, statistical testing, clustering, and classification. With today’s unprecedented amount of data, researchers are challenged by the need to enhance meaningful signals without the interference of unwanted confounders hidden inside the data. Data visualization is an important step toward exploring and explaining data in order to intuitively identify the dominant patterns. Principal coordinate analysis (PCoA), as a visualization tool, allows flexible ways to define pairwise distances and project the samples into lower dimensions without changing the distances. However, when visualizing large-scale biological datasets, the true patterns are often hindered by unwanted confounding variations, either biologically or technically in origin. To eliminate these confounding factors and recover underlying signals, we proposed a method called Adjustment for Confounding factors using Principal Coordinate Analysis, or AC-PCoA, and showed that it significantly outperforms existing methods in visualization through three simulation studies and five real datasets. We further showed that the low-dimensional representations given by AC-PCoA provide promising results in statistical testing, clustering, and classification as well.
Collapse
Affiliation(s)
- Yu Wang
- School of Mathematical Sciences, Fudan University, Shanghai, China
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China
| | - Fengzhu Sun
- Quantitative and Computational Biology Department, University of Southern California, Los Angeles, California, United States of America
| | - Wei Lin
- School of Mathematical Sciences, Fudan University, Shanghai, China
- Research Institute of Intelligent Complex Systems, Fudan University, Shanghai, China
- State Key Laboratory of Medical Neurobiology, MOE Frontiers Center for Brain Science, and Institutes of Brain Science, Fudan University, Shanghai, China
- Shanghai Artificial Intelligence Laboratory, Shanghai, China
- Key Laboratory of Mathematics for Nonlinear Science (Fudan University), Ministry of Education, Shanghai, China
- Shanghai Key Laboratory for Contemporary Applied Mathematics (Fudan University), Shanghai, China
| | - Shuqin Zhang
- School of Mathematical Sciences, Fudan University, Shanghai, China
- Key Laboratory of Mathematics for Nonlinear Science (Fudan University), Ministry of Education, Shanghai, China
- Shanghai Key Laboratory for Contemporary Applied Mathematics (Fudan University), Shanghai, China
- * E-mail:
| |
Collapse
|
3
|
Gao C, Wei H, Zhang K. LORSEN: Fast and Efficient eQTL Mapping With Low Rank Penalized Regression. Front Genet 2021; 12:690926. [PMID: 34868194 PMCID: PMC8636089 DOI: 10.3389/fgene.2021.690926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2021] [Accepted: 10/08/2021] [Indexed: 12/02/2022] Open
Abstract
Characterization of genetic variations that are associated with gene expression levels is essential to understand cellular mechanisms that underline human complex traits. Expression quantitative trait loci (eQTL) mapping attempts to identify genetic variants, such as single nucleotide polymorphisms (SNPs), that affect the expression of one or more genes. With the availability of a large volume of gene expression data, it is necessary and important to develop fast and efficient statistical and computational methods to perform eQTL mapping for such large scale data. In this paper, we proposed a new method, the low rank penalized regression method (LORSEN), for eQTL mapping. We evaluated and compared the performance of LORSEN with two existing methods for eQTL mapping using extensive simulations as well as real data from the HapMap3 project. Simulation studies showed that our method outperformed two commonly used methods for eQTL mapping, LORS and FastLORS, in many scenarios in terms of area under the curve (AUC). We illustrated the usefulness of our method by applying it to SNP variants data and gene expression levels on four chromosomes from the HapMap3 Project.
Collapse
Affiliation(s)
- Cheng Gao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| | - Hairong Wei
- College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, MI, United States
| | - Kui Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI, United States
| |
Collapse
|