1
|
Affiliation(s)
- Fernando Alfonso
- Department of Cardiology, IIS-IP, CIBER-CV, Hospital Universitario de La Princesa, Universidad Autónoma de Madrid, C/ Diego de León 62, Madrid 28006, Spain
| | - Alexander Marschall
- Department of Cardiology, IIS-IP, CIBER-CV, Hospital Universitario de La Princesa, Universidad Autónoma de Madrid, C/ Diego de León 62, Madrid 28006, Spain
| | - Fernando Rivero
- Department of Cardiology, IIS-IP, CIBER-CV, Hospital Universitario de La Princesa, Universidad Autónoma de Madrid, C/ Diego de León 62, Madrid 28006, Spain
| |
Collapse
|
2
|
Grady SK, Dojcsak L, Harville EW, Wallace ME, Vilda D, Donneyong MM, Hood DB, Valdez RB, Ramesh A, Im W, Matthews-Juarez P, Juarez PD, Langston MA. Seminar: Scalable Preprocessing Tools for Exposomic Data Analysis. ENVIRONMENTAL HEALTH PERSPECTIVES 2023; 131:124201. [PMID: 38109119 PMCID: PMC10727037 DOI: 10.1289/ehp12901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 11/22/2023] [Accepted: 11/28/2023] [Indexed: 12/19/2023]
Abstract
BACKGROUND The exposome serves as a popular framework in which to study exposures from chemical and nonchemical stressors across the life course and the differing roles that these exposures can play in human health. As a result, data relevant to the exposome have been used as a resource in the quest to untangle complicated health trajectories and help connect the dots from exposures to adverse outcome pathways. OBJECTIVES The primary aim of this methods seminar is to clarify and review preprocessing techniques critical for accurate and effective external exposomic data analysis. Scalability is emphasized through an application of highly innovative combinatorial techniques coupled with more traditional statistical strategies. The Public Health Exposome is used as an archetypical model. The novelty and innovation of this seminar's focus stem from its methodical, comprehensive treatment of preprocessing and its demonstration of the positive effects preprocessing can have on downstream analytics. DISCUSSION State-of-the-art technologies are described for data harmonization and to mitigate noise, which can stymie downstream interpretation, and to select key exposomic features, without which analytics may lose focus. A main task is the reduction of multicollinearity, a particularly formidable problem that frequently arises from repeated measurements of similar events taken at various times and from multiple sources. Empirical results highlight the effectiveness of a carefully planned preprocessing workflow as demonstrated in the context of more highly concentrated variable lists, improved correlational distributions, and enhanced downstream analytics for latent relationship discovery. The nascent field of exposome science can be characterized by the need to analyze and interpret a complex confluence of highly inhomogeneous spatial and temporal data, which may present formidable challenges to even the most powerful analytical tools. A systematic approach to preprocessing can therefore provide an essential first step in the application of modern computer and data science methods. https://doi.org/10.1289/EHP12901.
Collapse
Affiliation(s)
- Stephen K. Grady
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, Tennessee, USA
| | - Levente Dojcsak
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee, USA
| | - Emily W. Harville
- Department Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana, USA
| | - Maeve E. Wallace
- Department of Social, Behavioral, and Population Sciences, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana, USA
| | - Dovile Vilda
- Department of Social, Behavioral, and Population Sciences, Tulane University School of Public Health and Tropical Medicine, New Orleans, Louisiana, USA
| | | | - Darryl B. Hood
- Division of Environmental Health Sciences, College of Public Health, Ohio State University, Columbus, Ohio, USA
| | - R. Burciaga Valdez
- Department of Economics, University of New Mexico, Albuquerque, New Mexico, USA
| | - Aramandla Ramesh
- Department of Biochemistry, Cancer Biology, Neuroscience & Pharmacology, Meharry Medical College, Nashville, Tennessee, USA
| | - Wansoo Im
- Department of Family and Community Medicine, Meharry Medical College, Nashville, Tennessee, USA
| | | | - Paul D. Juarez
- Department of Family and Community Medicine, Meharry Medical College, Nashville, Tennessee, USA
- Institute on Health Disparities, Equity, and the Exposome, Meharry Medical College, Nashville, Tennessee, USA
| | - Michael A. Langston
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee, USA
| |
Collapse
|