1
|
Fox NS, Tian M, Markowitz AL, Haider S, Li CH, Boutros PC. iSubGen generates integrative disease subtypes by pairwise similarity assessment. CELL REPORTS METHODS 2024; 4:100884. [PMID: 39447572 PMCID: PMC11705582 DOI: 10.1016/j.crmeth.2024.100884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 07/06/2023] [Accepted: 10/01/2024] [Indexed: 10/26/2024]
Abstract
There are myriad types of biomedical data-molecular, clinical images, and others. When a group of patients with the same underlying disease exhibits similarities across multiple types of data, this is called a subtype. Existing subtyping approaches struggle to handle diverse data types with missing information. To improve subtype discovery, we exploited changes in the correlation-structure between different data types to create iSubGen, an algorithm for integrative subtype generation. iSubGen can accommodate any feature that can be compared with a similarity metric to create subtypes versatilely. It can combine arbitrary data types for subtype discovery, such as merging genetic, transcriptomic, proteomic, and pathway data. iSubGen recapitulates known subtypes across multiple cancers even with substantial missing data and identifies subtypes with distinct clinical behaviors. It performs equally with or superior to other subtyping methods, offering greater stability and robustness to missing data and flexibility to new data types. It is available at https://cran.r-project.org/web/packages/iSubGen.
Collapse
Affiliation(s)
- Natalie S Fox
- Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada; Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA; Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA, USA; Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, USA; Ontario Institute for Cancer Research, Toronto, ON M5G 0A3, Canada
| | - Mao Tian
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA; Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA, USA; Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, USA.
| | - Alexander L Markowitz
- Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA; Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA, USA; Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, USA
| | - Syed Haider
- The Breast Cancer Now Toby Robins Research Centre, The Institute of Cancer Research, London, UK
| | - Constance H Li
- Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada; Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA; Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA, USA; Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, USA; Ontario Institute for Cancer Research, Toronto, ON M5G 0A3, Canada
| | - Paul C Boutros
- Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada; Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA, USA; Institute for Precision Health, University of California, Los Angeles, Los Angeles, CA, USA; Jonsson Comprehensive Cancer Center, University of California, Los Angeles, Los Angeles, CA, USA; Department of Pharmacology and Toxicology, University of Toronto, Toronto, ON M5S 1A8, Canada; Department of Urology, University of California, Los Angeles, Los Angeles, CA, USA; Broad Stem Cell Research Center, University of California, Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
2
|
Li Y, Wong KY, Howard AG, Gordon-Larsen P, Highland HM, Graff M, North KE, Downie CG, Avery CL, Yu B, Young KL, Buchanan VL, Kaplan R, Hou L, Joyce BT, Qi Q, Sofer T, Moon JY, Lin DY. Multivariable Mendelian randomization with incomplete measurements on the exposure variables in the Hispanic Community Health Study/Study of Latinos. HGG ADVANCES 2024; 5:100338. [PMID: 39095990 PMCID: PMC11382109 DOI: 10.1016/j.xhgg.2024.100338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 07/27/2024] [Accepted: 07/27/2024] [Indexed: 08/04/2024] Open
Abstract
Multivariable Mendelian randomization allows simultaneous estimation of direct causal effects of multiple exposure variables on an outcome. When the exposure variables of interest are quantitative omic features, obtaining complete data can be economically and technically challenging: the measurement cost is high, and the measurement devices may have inherent detection limits. In this paper, we propose a valid and efficient method to handle unmeasured and undetectable values of the exposure variables in a one-sample multivariable Mendelian randomization analysis with individual-level data. We estimate the direct causal effects with maximum likelihood estimation and develop an expectation-maximization algorithm to compute the estimators. We show the advantages of the proposed method through simulation studies and provide an application to the Hispanic Community Health Study/Study of Latinos, which has a large amount of unmeasured exposure data.
Collapse
Affiliation(s)
- Yilun Li
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Kin Yau Wong
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
| | - Annie Green Howard
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Penny Gordon-Larsen
- Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Nutrition, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Heather M Highland
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Carolina G Downie
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Christy L Avery
- Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Bing Yu
- Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Kristin L Young
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Victoria L Buchanan
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Robert Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA; Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Lifang Hou
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Brian Thomas Joyce
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Qibin Qi
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Tamar Sofer
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Jee-Young Moon
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Dan-Yu Lin
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| |
Collapse
|
3
|
Chen BD, Lee C, Tapia AL, Reiner AP, Tang H, Kooperberg C, Manson JE, Li Y, Raffield LM. Proteome-wide association study using cis and trans variants and applied to blood cell and lipid-related traits in the Women's Health Initiative study. Genet Epidemiol 2024; 48:310-323. [PMID: 38940271 DOI: 10.1002/gepi.22578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 05/26/2024] [Accepted: 06/13/2024] [Indexed: 06/29/2024]
Abstract
In most Proteome-Wide Association Studies (PWAS), variants near the protein-coding gene (±1 Mb), also known as cis single nucleotide polymorphisms (SNPs), are used to predict protein levels, which are then tested for association with phenotypes. However, proteins can be regulated through variants outside of the cis region. An intermediate GWAS step to identify protein quantitative trait loci (pQTL) allows for the inclusion of trans SNPs outside the cis region in protein-level prediction models. Here, we assess the prediction of 540 proteins in 1002 individuals from the Women's Health Initiative (WHI), split equally into a GWAS set, an elastic net training set, and a testing set. We compared the testing r2 between measured and predicted protein levels using this proposed approach, to the testing r2 using only cis SNPs. The two methods usually resulted in similar testing r2, but some proteins showed a significant increase in testing r2 with our method. For example, for cartilage acidic protein 1, the testing r2 increased from 0.101 to 0.351. We also demonstrate reproducible findings for predicted protein association with lipid and blood cell traits in WHI participants without proteomics data and in UK Biobank utilizing our PWAS weights.
Collapse
Affiliation(s)
- Brian D Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Chanhwa Lee
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Amanda L Tapia
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, Pennsylvania, USA
| | - Alexander P Reiner
- Department of Epidemiology, University of Washington, Seattle, Washington, USA
| | - Hua Tang
- Department of Genetics, Stanford University School of Medicine, Stanford, California, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, Washington, USA
| | - JoAnn E Manson
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|
4
|
Li Y, Wong KY, Howard AG, Gordon-Larsen P, Highland HM, Graff M, North KE, Downie CG, Avery CL, Yu B, Young KL, Buchanan VL, Kaplan R, Hou L, Joyce BT, Qi Q, Sofer T, Moon JY, Lin DY. Mendelian randomization with incomplete measurements on the exposure in the Hispanic Community Health Study/Study of Latinos. HGG ADVANCES 2024; 5:100245. [PMID: 37817410 PMCID: PMC10628889 DOI: 10.1016/j.xhgg.2023.100245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 10/02/2023] [Accepted: 10/02/2023] [Indexed: 10/12/2023] Open
Abstract
Mendelian randomization has been widely used to assess the causal effect of a heritable exposure variable on an outcome of interest, using genetic variants as instrumental variables. In practice, data on the exposure variable can be incomplete due to high cost of measurement and technical limits of detection. In this paper, we propose a valid and efficient method to handle both unmeasured and undetectable values of the exposure variable in one-sample Mendelian randomization analysis with individual-level data. We estimate the causal effect of the exposure variable on the outcome using maximum likelihood estimation and develop an expectation maximization algorithm for the computation of the estimator. Simulation studies show that the proposed method performs well in making inference on the causal effect. We apply our method to the Hispanic Community Health Study/Study of Latinos, a community-based prospective cohort study, and estimate the causal effect of several metabolites on phenotypes of interest.
Collapse
Affiliation(s)
- Yilun Li
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Kin Yau Wong
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
| | - Annie Green Howard
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Penny Gordon-Larsen
- Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Nutrition, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Heather M Highland
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Carolina G Downie
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Christy L Avery
- Carolina Population Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Bing Yu
- Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Kristin L Young
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Victoria L Buchanan
- Department of Epidemiology, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Robert Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA; Public Health Sciences Division, Fred Hutchinson Cancer Center, Seattle, WA 98109, USA
| | - Lifang Hou
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Brian Thomas Joyce
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Qibin Qi
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Tamar Sofer
- Department of Medicine, Harvard Medical School, Boston, MA 02115, USA
| | - Jee-Young Moon
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA
| | - Dan-Yu Lin
- Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA.
| |
Collapse
|
5
|
Ye X, Shang Y, Shi T, Zhang W, Sakurai T. Multi-omics clustering for cancer subtyping based on latent subspace learning. Comput Biol Med 2023; 164:107223. [PMID: 37490833 DOI: 10.1016/j.compbiomed.2023.107223] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/07/2023] [Accepted: 06/30/2023] [Indexed: 07/27/2023]
Abstract
The increased availability of high-throughput technologies has enabled biomedical researchers to learn about disease etiology across multiple omics layers, which shows promise for improving cancer subtype identification. Many computational methods have been developed to perform clustering on multi-omics data, however, only a few of them are applicable for partial multi-omics in which some samples lack data in some types of omics. In this study, we propose a novel multi-omics clustering method based on latent sub-space learning (MCLS), which can deal with the missing multi-omics for clustering. We utilize the data with complete omics to construct a latent subspace using PCA-based feature extraction and singular value decomposition (SVD). The data with incomplete multi-omics are then projected to the latent subspace, and spectral clustering is performed to find the clusters. The proposed MCLS method is evaluated on seven different cancer datasets on three levels of omics in both full and partial cases compared to several state-of-the-art methods. The experimental results show that the proposed MCLS method is more efficient and effective than the compared methods for cancer subtype identification in multi-omics data analysis, which provides important references to a comprehensive understanding of cancer and biological mechanisms. AVAILABILITY: The proposed method can be freely accessible at https://github.com/ShangCS/MCLS.
Collapse
Affiliation(s)
- Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan.
| | - Yifan Shang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tianyi Shi
- Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Weihang Zhang
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, 3058577, Japan; Tsukuba Life Science Innovation Program, University of Tsukuba, Tsukuba, 3058577, Japan
| |
Collapse
|
6
|
Flores JE, Claborne DM, Weller ZD, Webb-Robertson BJM, Waters KM, Bramer LM. Missing data in multi-omics integration: Recent advances through artificial intelligence. Front Artif Intell 2023; 6:1098308. [PMID: 36844425 PMCID: PMC9949722 DOI: 10.3389/frai.2023.1098308] [Citation(s) in RCA: 40] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 01/23/2023] [Indexed: 02/11/2023] Open
Abstract
Biological systems function through complex interactions between various 'omics (biomolecules), and a more complete understanding of these systems is only possible through an integrated, multi-omic perspective. This has presented the need for the development of integration approaches that are able to capture the complex, often non-linear, interactions that define these biological systems and are adapted to the challenges of combining the heterogenous data across 'omic views. A principal challenge to multi-omic integration is missing data because all biomolecules are not measured in all samples. Due to either cost, instrument sensitivity, or other experimental factors, data for a biological sample may be missing for one or more 'omic techologies. Recent methodological developments in artificial intelligence and statistical learning have greatly facilitated the analyses of multi-omics data, however many of these techniques assume access to completely observed data. A subset of these methods incorporate mechanisms for handling partially observed samples, and these methods are the focus of this review. We describe recently developed approaches, noting their primary use cases and highlighting each method's approach to handling missing data. We additionally provide an overview of the more traditional missing data workflows and their limitations; and we discuss potential avenues for further developments as well as how the missing data issue and its current solutions may generalize beyond the multi-omics context.
Collapse
Affiliation(s)
- Javier E. Flores
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Daniel M. Claborne
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Zachary D. Weller
- Pacific Northwest National Laboratory, Artificial Intelligence and Data Analytics Division, National Security Directorate, Richland, WA, United States
| | - Bobbie-Jo M. Webb-Robertson
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Katrina M. Waters
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| | - Lisa M. Bramer
- Pacific Northwest National Laboratory, Biological Sciences Division, Earth and Biological Sciences Directorate, Richland, WA, United States
| |
Collapse
|
7
|
KIDD JOHN, RAULERSON CHELSEAK, MOHLKE KARENL, LIN DANYU. Mediation analysis of multiple mediators with incomplete omics data. Genet Epidemiol 2023; 47:61-77. [PMID: 36125445 PMCID: PMC10423053 DOI: 10.1002/gepi.22504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 06/29/2022] [Accepted: 08/16/2022] [Indexed: 02/01/2023]
Abstract
There is an increasing interest in using multiple types of omics features (e.g., DNA sequences, RNA expressions, methylation, protein expressions, and metabolic profiles) to study how the relationships between phenotypes and genotypes may be mediated by other omics markers. Genotypes and phenotypes are typically available for all subjects in genetic studies, but typically, some omics data will be missing for some subjects, due to limitations such as cost and sample quality. In this article, we propose a powerful approach for mediation analysis that accommodates missing data among multiple mediators and allows for various interaction effects. We formulate the relationships among genetic variants, other omics measurements, and phenotypes through linear regression models. We derive the joint likelihood for models with two mediators, accounting for arbitrary patterns of missing values. Utilizing computationally efficient and stable algorithms, we conduct maximum likelihood estimation. Our methods produce unbiased and statistically efficient estimators. We demonstrate the usefulness of our methods through simulation studies and an application to the Metabolic Syndrome in Men study.
Collapse
Affiliation(s)
- JOHN KIDD
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| | - CHELSEA K. RAULERSON
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, U.S.A
| | - KAREN L. MOHLKE
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, U.S.A
| | - DAN-YU LIN
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, U.S.A
| |
Collapse
|
8
|
The Missing Indicator Approach for Accelerated Failure Time Model with Covariates Subject to Limits of Detection. STATS 2022. [DOI: 10.3390/stats5020029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The limit of detection (LOD) is commonly encountered in observational studies when one or more covariate values fall outside the measuring ranges. Although the complete-case (CC) approach is widely employed in the presence of missing values, it could result in biased estimations or even become inapplicable in small sample studies. On the other hand, approaches such as the missing indicator (MDI) approach are attractive alternatives as they preserve sample sizes. This paper compares the effectiveness of different alternatives to the CC approach under different LOD settings with a survival outcome. These alternatives include substitution methods, multiple imputation (MI) methods, MDI approaches, and MDI-embedded MI approaches. We found that the MDI approach outperformed its competitors regarding bias and mean squared error in small sample sizes through extensive simulation.
Collapse
|
9
|
Avery CL, Howard AG, Ballou AF, Buchanan VL, Collins JM, Downie CG, Engel SM, Graff M, Highland HM, Lee MP, Lilly AG, Lu K, Rager JE, Staley BS, North KE, Gordon-Larsen P. Strengthening Causal Inference in Exposomics Research: Application of Genetic Data and Methods. ENVIRONMENTAL HEALTH PERSPECTIVES 2022; 130:55001. [PMID: 35533073 PMCID: PMC9084332 DOI: 10.1289/ehp9098] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 04/08/2022] [Accepted: 04/12/2022] [Indexed: 05/11/2023]
Abstract
Advances in technologies to measure a broad set of exposures have led to a range of exposome research efforts. Yet, these efforts have insufficiently integrated methods that incorporate genetic data to strengthen causal inference, despite evidence that many exposome-associated phenotypes are heritable. Objective: We demonstrate how integration of methods and study designs that incorporate genetic data can strengthen causal inference in exposomics research by helping address six challenges: reverse causation and unmeasured confounding, comprehensive examination of phenotypic effects, low efficiency, replication, multilevel data integration, and characterization of tissue-specific effects. Examples are drawn from studies of biomarkers and health behaviors, exposure domains where the causal inference methods we describe are most often applied. Discussion: Technological, computational, and statistical advances in genotyping, imputation, and analysis, combined with broad data sharing and cross-study collaborations, offer multiple opportunities to strengthen causal inference in exposomics research. Full application of these opportunities will require an expanded understanding of genetic variants that predict exposome phenotypes as well as an appreciation that the utility of genetic variants for causal inference will vary by exposure and may depend on large sample sizes. However, several of these challenges can be addressed through international scientific collaborations that prioritize data sharing. Ultimately, we anticipate that efforts to better integrate methods that incorporate genetic data will extend the reach of exposomics research by helping address the challenges of comprehensively measuring the exposome and its health effects across studies, the life course, and in varied contexts and diverse populations. https://doi.org/10.1289/EHP9098.
Collapse
Affiliation(s)
- Christy L Avery
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Annie Green Howard
- Department of Biostatistics, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Anna F Ballou
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Victoria L Buchanan
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Jason M Collins
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Carolina G Downie
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Stephanie M Engel
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Mariaelisa Graff
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Heather M Highland
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Moa P Lee
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Adam G Lilly
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Department of Sociology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kun Lu
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Julia E Rager
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Brooke S Staley
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Kari E North
- Department of Epidemiology, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Penny Gordon-Larsen
- Department of Nutrition, Gillings School of Global Public Health, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
- Carolina Population Center, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| |
Collapse
|