1
|
Thomas DM, Knight R, Gilbert JA, Cornelis MC, Gantz MG, Burdekin K, Cummiskey K, Sumner SCJ, Pathmasiri W, Sazonov E, Gabriel KP, Dooley EE, Green MA, Pfluger A, Kleinberg S. Transforming Big Data into AI-ready data for nutrition and obesity research. Obesity (Silver Spring) 2024; 32:857-870. [PMID: 38426232 DOI: 10.1002/oby.23989] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 12/15/2023] [Accepted: 12/26/2023] [Indexed: 03/02/2024]
Abstract
OBJECTIVE Big Data are increasingly used in obesity and nutrition research to gain new insights and derive personalized guidance; however, this data in raw form are often not usable. Substantial preprocessing, which requires machine learning (ML), human judgment, and specialized software, is required to transform Big Data into artificial intelligence (AI)- and ML-ready data. These preprocessing steps are the most complex part of the entire modeling pipeline. Understanding the complexity of these steps by the end user is critical for reducing misunderstanding, faulty interpretation, and erroneous downstream conclusions. METHODS We reviewed three popular obesity/nutrition Big Data sources: microbiome, metabolomics, and accelerometry. The preprocessing pipelines, specialized software, challenges, and how decisions impact final AI- and ML-ready products were detailed. RESULTS Opportunities for advances to improve quality control, speed of preprocessing, and intelligent end user consumption were presented. CONCLUSIONS Big Data have the exciting potential for identifying new modifiable factors that impact obesity research. However, to ensure accurate interpretation of conclusions arising from Big Data, the choices involved in preparing AI- and ML-ready data need to be transparent to investigators and clinicians relying on the conclusions.
Collapse
Affiliation(s)
- Diana M Thomas
- Department of Mathematical Sciences, United States Military Academy, West Point, New York, USA
| | - Rob Knight
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, California, USA
| | - Jack A Gilbert
- Department of Pediatrics and Scripps Institution of Oceanography, University of California San Diego, La Jolla, California, USA
| | - Marilyn C Cornelis
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, Illinois, USA
| | - Marie G Gantz
- Biostatics and Epidemiology Division, Research Triangle Institute International, Research Triangle Park, North Carolina, USA
| | - Kate Burdekin
- Biostatics and Epidemiology Division, Research Triangle Institute International, Research Triangle Park, North Carolina, USA
| | - Kevin Cummiskey
- Department of Mathematical Sciences, United States Military Academy, West Point, New York, USA
| | - Susan C J Sumner
- Department of Nutrition, Nutrition Research Institute, University of North Carolina Chapel Hill, Kannapolis, North Carolina, USA
| | - Wimal Pathmasiri
- Department of Nutrition, Nutrition Research Institute, University of North Carolina Chapel Hill, Kannapolis, North Carolina, USA
| | - Edward Sazonov
- Electrical and Computer Engineering Department, The University of Alabama, Tuscaloosa, Alabama, USA
| | - Kelley Pettee Gabriel
- Department of Epidemiology, The University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Erin E Dooley
- Department of Epidemiology, The University of Alabama at Birmingham, Birmingham, Alabama, USA
| | - Mark A Green
- Department of Geography & Planning, University of Liverpool, Liverpool, UK
| | - Andrew Pfluger
- Department of Geography and Environmental Engineering, United States Military Academy, West Point, New York, USA
| | - Samantha Kleinberg
- Computer Science Department, Stevens Institute of Technology, Hoboken, New Jersey, USA
| |
Collapse
|