1
|
Gärber F, Bockmayr B, Creydt M, Fischer M, Seifert S. Data fusion of elemental and metabolic fingerprints of asparagus with random forest approaches. Anal Chim Acta 2025; 1357:344006. [PMID: 40316379 DOI: 10.1016/j.aca.2025.344006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2024] [Revised: 02/19/2025] [Accepted: 03/31/2025] [Indexed: 05/04/2025]
Abstract
BACKGROUND Various analytical methods such as liquid chromatography-mass spectrometry (LC-MS) and inductively coupled plasma-mass spectrometry (ICP-MS) are used for the characterisation and authentication of foods. These two analytical techniques target very different parts of the complex composition of the samples and therefore fusion of the data promises better performance of the corresponding models. RESULTS ICP-MS and LC-MS data were fused for the classification of the geographical origin of 220 asparagus samples with random forest. The results show that the combination of elemental and metabolomic fingerprints leads to an improvement of the accuracy from approximately 88 % to 92.3 %. In particular, the fusion improves the classification of small groups, which is reflected by an increase in the Cohen's Kappa value from around 0.7 to 0.8. Furthermore, we applied surrogate minimal depth (SMD) to elemental fingerprints and fused data of elemental and metabolomic fingerprints for the first time. This made it possible to select relevant features and evaluate their mutual impact on the classification model, illustrating the interplay of the elemental and metabolic variables in the fused random forest model. SIGNIFICANCE Using the classification of the geographical origin of asparagus, we show that the fusion of LC-MS and ICP-MS data is useful for improving the performance of food authentication. Furthermore, we show that SMD can be applied to analyse the mutual impact of features of single data sets but also across multiple data sets in the context of data fusion.
Collapse
Affiliation(s)
- Florian Gärber
- Hamburg School of Food Science, University of Hamburg, Grindelallee 117, Hamburg, 20146, Germany
| | - Bernadette Bockmayr
- Hamburg School of Food Science, University of Hamburg, Grindelallee 117, Hamburg, 20146, Germany
| | - Marina Creydt
- Hamburg School of Food Science, University of Hamburg, Grindelallee 117, Hamburg, 20146, Germany
| | - Markus Fischer
- Hamburg School of Food Science, University of Hamburg, Grindelallee 117, Hamburg, 20146, Germany
| | - Stephan Seifert
- Hamburg School of Food Science, University of Hamburg, Grindelallee 117, Hamburg, 20146, Germany.
| |
Collapse
|
2
|
Rubel V, Filker S, Lanzén A, Abad IL, Stoeck T. Exploiting taxonomic information from metagenomes to infer bacterial bioindicators and environmental quality at salmon aquaculture installations. MARINE POLLUTION BULLETIN 2025; 218:118173. [PMID: 40414102 DOI: 10.1016/j.marpolbul.2025.118173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Revised: 05/14/2025] [Accepted: 05/15/2025] [Indexed: 05/27/2025]
Abstract
Environmental DNA (eDNA) metabarcoding has emerged as a powerful method for assessing the environmental impacts of marine Atlantic salmon aquaculture by identifying bacterial bioindicators and inferring biotic indices. However, because this approach relies on the PCR amplification of 16S rRNA gene fragments, it may introduce errors that compromise bioindicator reliability. In contrast, metagenomic analysis which captures the complete set of genetic material directly extracted from environmental samples circumvents biases inherent to PCR amplification. We hypothesized that metagenomic data could offer superior assessments of benthic environmental impacts associated with salmon aquaculture compared to metabarcoding. To test this, we compared bacterial community structures derived from both metabarcoding and metagenomic analyses of 68 sediment samples obtained from aquaculture installation sites characterized by varying degrees of benthic impact as determined by macroinvertebrate inventories. Bacterial bioindicators were identified from each dataset, and Random Forest models were used to predict the degrees of benthic impacts. Metagenomics identified a greater number of bioindicators at both the family and individual sequence variant levels, resulting in higher predictive accuracy for impact assessments. Notably, only a few bioindicators were common to both methods, suggesting that methodological limitations and distorted abundance patterns in metabarcoding data may lead to spurious indicators. These findings highlight both the challenges and potential advantages of employing metagenomics for reliable environmental impact assessments.
Collapse
Affiliation(s)
- Verena Rubel
- Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau, Ecology Group, D-67663 Kaiserslautern, Germany
| | - Sabine Filker
- Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau, Ecology Group, D-67663 Kaiserslautern, Germany
| | - Anders Lanzén
- AZTI, Marine Research, Basque Research and Technology Alliance (BRTA), Pasaia, Spain; IKERBASQUE, Basque Foundation for Science, Bilbao, Spain
| | - Ion Luis Abad
- AZTI, Marine Research, Basque Research and Technology Alliance (BRTA), Pasaia, Spain
| | - Thorsten Stoeck
- Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau, Ecology Group, D-67663 Kaiserslautern, Germany.
| |
Collapse
|
3
|
Tang LJ, Li XK, Huang Y, Zhang XZ, Li BQ. A novel importance scores based variable selection approach and validation using a MIR and NIR dataset. SPECTROCHIMICA ACTA. PART A, MOLECULAR AND BIOMOLECULAR SPECTROSCOPY 2025; 330:125701. [PMID: 39793249 DOI: 10.1016/j.saa.2025.125701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 12/07/2024] [Accepted: 01/02/2025] [Indexed: 01/13/2025]
Abstract
Variable selection is important in spectral analysis for improving interpretation quality and accuracy. This study introduces a novel variable selection process, named "VMHBSC", which consists of six steps, with each letter representing one step. To demonstrate its process and advantages, two datasets were employed, a mid-infrared spectral (MIR) dataset (234 × 7468, sample number × variables) of Chenpi samples (a traditional Chinese medicinal material derived from the dried peel of mature tangerines) and a near-infrared spectral (NIR) dataset (16000 × 256) for modeling competition. In the MIR dataset, VMHBSC selected 3 important variables from all 7468 variables, and models established using Decision Trees (DT), Gradient Boosting Decision Tree (GBDT), and Extreme Gradient Boosting (XGBoost) achieved higher accuracy compared to models using other variable selection methods. For the NIR dataset, VMHBSC selected 24 important variables from all 256 variables. Based on these 24 common variables, three hybrid models (VMHBSC-DT, VMHBSC-GBDT and VMHBSC-XGBoost) were also established and shown stable performance. These findings indicate the effectiveness of the VMHBSC process in enhancing model performance and robustness.
Collapse
Affiliation(s)
- Li Jun Tang
- School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, PR China
| | - Xin Kang Li
- School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, PR China
| | - Yue Huang
- School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, PR China
| | - Xiang-Zhi Zhang
- School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, PR China
| | - Bao Qiong Li
- School of Pharmacy and Food Engineering, Wuyi University, Jiangmen 529020, PR China.
| |
Collapse
|
4
|
Rose EB, Steele MK, Tolar B, Pettengill J, Batz M, Bazaco M, Tameru B, Cui Z, Lindsey RL, Simmons M, Chen J, Posny D, Carleton H, Bruce BB. Attribution of Salmonella enterica to Food Sources by Using Whole-Genome Sequencing Data. Emerg Infect Dis 2025; 31:783-790. [PMID: 40133041 PMCID: PMC11950287 DOI: 10.3201/eid3104.241172] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/27/2025] Open
Abstract
Salmonella enterica bacteria are a leading cause of foodborne illness in the United States; however, most Salmonella illnesses are not associated with known outbreaks, and predicting the source of sporadic illnesses remains a challenge. We used a supervised random forest model to determine the most likely sources responsible for human salmonellosis cases in the United States. We trained the model by using whole-genome multilocus sequence typing data from 18,661 Salmonella isolates from collected single food sources and used feature selection to determine the subset of loci most influential for prediction. The overall out-of-bag accuracy of the trained model was 91%; the highest prediction accuracy was for chicken (97%). We applied the trained model to 6,470 isolates from humans with unknown exposure to predict the source of infection. Our model predicted that >33% of the human-derived Salmonella isolates originated from chicken and 27% were from vegetables.
Collapse
|
5
|
Molenaar JM, Leung KY, van der Meer L, Klein PPF, Struijs JN, Kiefte-de Jong JC. Predicting population-level vulnerability among pregnant women using routinely collected data and the added relevance of self-reported data. Eur J Public Health 2024; 34:1210-1217. [PMID: 39602553 PMCID: PMC11631480 DOI: 10.1093/eurpub/ckae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2024] Open
Abstract
Recognizing and addressing vulnerability during the first thousand days of life can prevent health inequities. It is necessary to determine the best data for predicting multidimensional vulnerability (i.e. risk factors to vulnerability across different domains and a lack of protective factors) at population level to understand national prevalence and trends. This study aimed to (1) assess the feasibility of predicting multidimensional vulnerability during pregnancy using routinely collected data, (2) explore potential improvement of these predictions by adding self-reported data on health, well-being, and lifestyle, and (3) identify the most relevant predictors. The study was conducted using Dutch nationwide routinely collected data and self-reported Public Health Monitor data. First, to predict multidimensional vulnerability using routinely collected data, we used random forest (RF) and considered the area under the curve (AUC) and F1 measure to assess RF model performance. To validate results, sensitivity analyses (XGBoost and Lasso) were done. Second, we gradually added self-reported data to predictions. Third, we explored the RF model's variable importance. The initial RF model could distinguish between those with and without multidimensional vulnerability (AUC = 0.98). The model was able to correctly predict multidimensional vulnerability in most cases, but there was also misclassification (F1 measure = 0.70). Adding self-reported data improved RF model performance (e.g. F1 measure = 0.80 after adding perceived health). The strongest predictors concerned self-reported health, socioeconomic characteristics, and healthcare expenditures and utilization. It seems possible to predict multidimensional vulnerability using routinely collected data that is readily available. However, adding self-reported data can improve predictions.
Collapse
Affiliation(s)
- Joyce M Molenaar
- Population Health and Health Services Research, Centre for Public Health, Healthcare and Society, National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands
- Department of Public Health and Primary Care/Health Campus The Hague, Leiden University Medical Centre, The Hague, the Netherlands
| | - Ka Yin Leung
- Department of Statistics, Data Science and Modelling, National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands
| | - Lindsey van der Meer
- Department of Obstetrics and Gynaecology, Erasmus MC, University Medical Centre, Rotterdam, the Netherlands
| | - Peter Paul F Klein
- Population Health and Health Services Research, Centre for Public Health, Healthcare and Society, National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands
| | - Jeroen N Struijs
- Population Health and Health Services Research, Centre for Public Health, Healthcare and Society, National Institute for Public Health and the Environment (RIVM), Bilthoven, the Netherlands
- Department of Public Health and Primary Care/Health Campus The Hague, Leiden University Medical Centre, The Hague, the Netherlands
| | - Jessica C Kiefte-de Jong
- Department of Public Health and Primary Care/Health Campus The Hague, Leiden University Medical Centre, The Hague, the Netherlands
| |
Collapse
|
6
|
Dietrichson J, Klokker R, Filges T, Bengtsen E, Pigott TD. Protocol: Machine learning for selecting moderators in meta-analysis: A systematic review of methods and their applications, and an evaluation using data on tutoring interventions. CAMPBELL SYSTEMATIC REVIEWS 2024; 20:e70009. [PMID: 39664510 PMCID: PMC11632158 DOI: 10.1002/cl2.70009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Accepted: 11/08/2024] [Indexed: 12/13/2024]
Abstract
Objectives This is the protocol for a Campbell systematic review. The objectives are as follows: The first objective is to find and describe machine and statistical learning (ML) methods designed for moderator meta-analysis. The second objective is to find and describe applications of such ML methods in moderator meta-analyses of health, medical, and social science interventions. These two parts of the meta-review will primarily involve a systematic review and will be conducted according to guidelines specified by the Campbell Collaboration (MECCIR guidelines). The outcomes will be a list of ML methods that are designed for moderator meta-analysis (first objective), and a description of how (some of) these methods have been applied in the health, medical, and social sciences (second objective). The third objective is to examine how the ML methods identified in the meta-review can help researchers formulate new hypotheses or select among existing ones, and compare the identified methods to one another and to regular meta-regression methods for moderator analysis. To compare the performance of different moderator meta-analysis methods, we will apply the methods to data on tutoring interventions from two systematic reviews of interventions to improve academic achievement for students with or at risk-of academic difficulties, and to an independent test sample of tutoring studies published after the search period in the two reviews.
Collapse
Affiliation(s)
- Jens Dietrichson
- Quantitative Methods, VIVE—The Danish Center for Social Science ResearchCopenhagenDenmark
| | - Rasmus Klokker
- Quantitative Methods, VIVE—The Danish Center for Social Science ResearchCopenhagenDenmark
| | - Trine Filges
- Quantitative Methods, VIVE—The Danish Center for Social Science ResearchCopenhagenDenmark
| | - Elizabeth Bengtsen
- Administration, VIVE—The Danish Center for Social Science ResearchCopenhagenDenmark
| | | |
Collapse
|
7
|
Fang W, Ren K, Liu T, Shang J, Jia S, Jiang X, Zhang J. An evaluation of random forest based input variable selection methods for one month ahead streamflow forecasting. Sci Rep 2024; 14:29766. [PMID: 39613890 DOI: 10.1038/s41598-024-81502-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Accepted: 11/27/2024] [Indexed: 12/01/2024] Open
Abstract
In the development of data-driven models for streamflow forecasting, choosing appropriate input variables is crucial. Although random forest (RF) has been successfully applied to streamflow forecasting for input variable selection (IVS), comparative analysis of different random forest-based IVS (RF-IVS) methods is yet absent. Here, we investigate performance of five RF-IVS methods in four data-driven models (RF, support vector regression (SVR), Gaussian process regression (GP), and long short-term memory (LSTM)). A case study is implemented in the contiguous United States for one-month-ahead streamflow forecasting. Results indicate that RF-IVS methods enable to acquire enhanced performance in comparison to widely used partial Pearson correlation and conditional mutual information. Meanwhile, performance-based RF-IVS methods appear to be superior to test-based methods, and the test-based methods tend to select redundant variables. The RF with a forward selection strategy is finally recommended to connect with GP model as a promising combination having potential to yield favorable performance.
Collapse
Affiliation(s)
- Wei Fang
- Yinshanbeilu Grassland Eco-hydrology National Observation and Research Station, China Institute of Water Resources and Hydropower Research, Beijing, 100038, China.
- State Key Laboratory of Eco-hydraulics in Northwest Arid Region of China, School of Water Resources and Hydropower, Xi'an University of Technology, Xi'an, 710048, China.
| | - Kun Ren
- North China University of Water Resources and Electric Power, Zhengzhou, 450046, China.
| | - Tiejun Liu
- Yinshanbeilu Grassland Eco-hydrology National Observation and Research Station, China Institute of Water Resources and Hydropower Research, Beijing, 100038, China
| | - Jianan Shang
- State Key Laboratory of Eco-hydraulics in Northwest Arid Region of China, School of Water Resources and Hydropower, Xi'an University of Technology, Xi'an, 710048, China
| | - Shengce Jia
- State Key Laboratory of Eco-hydraulics in Northwest Arid Region of China, School of Water Resources and Hydropower, Xi'an University of Technology, Xi'an, 710048, China
| | - Xiangxiang Jiang
- State Key Laboratory of Eco-hydraulics in Northwest Arid Region of China, School of Water Resources and Hydropower, Xi'an University of Technology, Xi'an, 710048, China
| | - Jie Zhang
- State Key Laboratory of Eco-hydraulics in Northwest Arid Region of China, School of Water Resources and Hydropower, Xi'an University of Technology, Xi'an, 710048, China
| |
Collapse
|
8
|
Yuan LL, Mitchell RM, Pilgrim EM, Smucker NJ. Inferences based on diatom compositions improve estimates of nutrient concentrations in streams. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 952:176032. [PMID: 39236813 PMCID: PMC11481158 DOI: 10.1016/j.scitotenv.2024.176032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 08/15/2024] [Accepted: 09/02/2024] [Indexed: 09/07/2024]
Abstract
Nutrient concentrations in streams vary strongly with flow conditions, and routinely gathered field measurements of nutrients reflect this variability. Diatom assemblage composition has been used in previous studies to infer nutrient concentrations, and because diatoms integrate nutrient concentrations over longer periods of time, diatom inferences may be less susceptible to fluctuations in streamflow. We tested this hypothesis by leveraging differences in the flashiness of streams across a large continental data set. More specifically, we tested whether the variabilities of direct measurements and diatom inferences of dissolved phosphorus and nitrate were greater in flashy versus non-flashy streams. We further considered whether models linking landscape predictor variables to nutrient concentrations yielded consistent results across flashy and non-flashy streams. Our analysis indicated that measured nutrient concentrations were more variable in flashy compared to non-flashy streams and that landscape models identified different important predictors of nutrient concentrations when fit using data from flashy vs. non-flashy streams. In contrast, variabilities of diatom-inferred nutrient concentrations were similar among stream types, as were the important predictor variables (e.g., manure application rates for nitrate and number of wet days for dissolved phosphorus). These analyses indicate that use of diatom-inferred nutrient concentrations can potentially improve efforts to quantify stream nutrient concentrations.
Collapse
Affiliation(s)
- Lester L Yuan
- Office of Water, U.S. Environmental Protection Agency, 1200 Pennsylvania Ave NW, Mail code 4304T, Washington, DC 20460, USA.
| | - Richard M Mitchell
- Office of Water, U.S. Environmental Protection Agency, 1200 Pennsylvania Ave NW, Mail code 4304T, Washington, DC 20460, USA
| | - Erik M Pilgrim
- Office of Research and Development, U.S. Environmental Protection Agency, 26 West Martin Luther King Drive, Mail stop 587, Cincinnati, OH 45268, USA
| | - Nathan J Smucker
- Office of Research and Development, U.S. Environmental Protection Agency, 26 West Martin Luther King Drive, Mail stop 587, Cincinnati, OH 45268, USA
| |
Collapse
|
9
|
Smith HL, Biggs PJ, French NP, Smith ANH, Marshall JC. Out of (the) bag-encoding categorical predictors impacts out-of-bag samples. PeerJ Comput Sci 2024; 10:e2445. [PMID: 39650463 PMCID: PMC11623134 DOI: 10.7717/peerj-cs.2445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Accepted: 10/01/2024] [Indexed: 12/11/2024]
Abstract
Performance of random forest classification models is often assessed and interpreted using out-of-bag (OOB) samples. Observations which are OOB when a tree is trained may serve as a test set for that tree and predictions from the OOB observations used to calculate OOB error and variable importance measures (VIM). OOB errors are popular because they are fast to compute and, for large samples, are a good estimate of the true prediction error. In this study, we investigate how target-based vs. target-agnostic encoding of categorical predictor variables for random forest can bias performance measures based on OOB samples. We show that, when categorical variables are encoded using a target-based encoding method, and when the encoding takes place prior to bagging, the OOB sample can underestimate the true misclassification rate, and overestimate variable importance. We recommend using a separate test data set when evaluating variable importance and/or predictive performance of tree based methods that utilise a target-based encoding method.
Collapse
Affiliation(s)
- Helen L. Smith
- School of Mathematical and Computational Sciences, Massey University, Palmerston North, New Zealand
| | - Patrick J. Biggs
- School of Food Technology and Natural Sciences, Massey University, Palmerston North, New Zealand
- NZ Food Safety and Science Research Centre, Massey University, Palmerston North, New Zealand
- School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Nigel P. French
- NZ Food Safety and Science Research Centre, Massey University, Palmerston North, New Zealand
- School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Adam N. H. Smith
- School of Mathematical and Computational Sciences, Massey University, Auckland, New Zealand
| | - Jonathan C. Marshall
- School of Mathematical and Computational Sciences, Massey University, Palmerston North, New Zealand
| |
Collapse
|
10
|
Gilman E, Chaloupka M, Posanau N, Hidalgo M, Pokajam S, Papaol D, Nanguromo A, Poisson F. Evidence to inform spatiotemporal management of a western Pacific Ocean tuna purse seine fishery. ECOLOGICAL APPLICATIONS : A PUBLICATION OF THE ECOLOGICAL SOCIETY OF AMERICA 2024:e3054. [PMID: 39460428 DOI: 10.1002/eap.3054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 07/30/2024] [Accepted: 08/26/2024] [Indexed: 10/28/2024]
Abstract
Fisheries can profoundly impact co-occurring species exposed to incidental capture. Spatiotemporal fisheries management holds substantial potential to balance socioeconomic benefits with ecological costs to threatened bycatch species. This study estimated the effect of the spatial and temporal distribution of effort by a western Pacific Ocean tuna purse seine fishery on catch rates of target and at-risk species by fitting spatially explicit generalized additive multilevel regression models within a Bayesian inference framework to observer data. Mean field prediction surfaces defined catch rate hotspots for tunas, silky sharks, rays, and whale sharks, informing the design of candidate area-based management strategies. Due to limited sample sizes, odontocete and marine turtle catch rate geospatial patterns were summarized using simple 2D hexagonal binning. Effort could be focused in two areas within core fishing grounds to reduce overlap with hotspots for silky sharks, rays, and whale sharks without affecting target catch. Effort could be shifted outside of core fishing areas to zones with higher target tuna catch rates to reduce overlap with hotspots for at-risk species. Sparse and small marine turtle and whale shark hotspots occurred across the fishing grounds. Results did not identify opportunities for temporally dynamic spatial management to balance target and at-risk catch rates. Research on the economic and operational viability of alternative spatial management strategies is a priority. A small subset of sets had disproportionately large odontocete captures. Real-time fleet communication, move-on rules, and avoiding sets on dolphin schools might reduce odontocete catch rates. Managing set association type and mesh size present additional opportunities to balance catch rates of at-risk and target species. Employing output controls that effectively constrain the fishery would alter the spatial management strategy to focus fishing within zones with the lowest ratio of at-risk bycatch to target tuna catch. Findings inform the design of alternative spatial management strategies to avoid catch rate hotspots of at-risk species without compromising the catch of principal market species.
Collapse
Affiliation(s)
- Eric Gilman
- Fisheries Research Group, The Safina Center, Honolulu, Hawaii, USA
| | - Milani Chaloupka
- Ecological Modelling Services Pty Ltd and Marine Spatial Ecology Lab, University of Queensland, Brisbane, Queensland, Australia
| | - Nialangis Posanau
- Papua New Guinea Fishing Industry Association, Port Moresby, Papua New Guinea
| | - Marcelo Hidalgo
- Papua New Guinea Fishing Industry Association, Port Moresby, Papua New Guinea
| | - Sylvester Pokajam
- Papua New Guinea Fishing Industry Association, Port Moresby, Papua New Guinea
| | - Donald Papaol
- Papua New Guinea Fishing Industry Association, Port Moresby, Papua New Guinea
| | - Adrian Nanguromo
- Papua New Guinea National Fisheries Authority, Port Moresby, Papua New Guinea
| | - Francois Poisson
- MARBEC IFREMER, IRD CNRS, University of Montpellier, Sète, France
| |
Collapse
|
11
|
Teo HC, Sarira TV, Tan ARP, Cheng Y, Koh LP. Charting the future of high forest low deforestation jurisdictions. Proc Natl Acad Sci U S A 2024; 121:e2306496121. [PMID: 39226355 PMCID: PMC11406276 DOI: 10.1073/pnas.2306496121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 06/14/2024] [Indexed: 09/05/2024] Open
Abstract
High forest low deforestation jurisdictions (HFLDs) contain many of the world's last intact forests with historically low deforestation. Since carbon financing typically uses historical deforestation rates as baselines, HFLDs facing the prospect of future threats may receive insufficient incentives to be protected. We found that from 2002 to 2020, HFLDs (n = 310) experienced 44% higher deforestation rates than their historical baselines, and 60 HFLDs underwent periods of high deforestation (deforestation rate > 0.501%) at 0.983 ± 0.649% (mean ± SD)-a rate 7.5 times higher than the 10-y historical baseline of all HFLDs. For HFLDs to receive sufficient carbon finance requires baselines that can better reflect future deforestation trajectories of HFLDs. Using an empirical multifactorial model, we show that most contemporary HFLDs are expected to undergo higher deforestation from 2020 to 2038 than their historical baselines, with 72 HFLDs likely (>66% probability) to undergo high deforestation. Over the next 18 y, HFLDs are expected to lose 2.16 Mha y-1 of forests corresponding to 585 ± 74 MtCO2e y-1 (mean ± SE) of emissions. Efforts to protect HFLD forests from future threats will be crucial. In particular, improving baselining methods is key to ensuring that sufficient financing can flow to HFLDs to prevent deforestation.
Collapse
Affiliation(s)
- Hoong Chen Teo
- Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore
- Centre for Nature-based Climate Solutions, National University of Singapore, Singapore 117546, Singapore
| | - Tasya Vadya Sarira
- Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore
- Centre for Nature-based Climate Solutions, National University of Singapore, Singapore 117546, Singapore
| | - Audrey R P Tan
- Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore
- Centre for Nature-based Climate Solutions, National University of Singapore, Singapore 117546, Singapore
| | - Yanyan Cheng
- Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore
- Centre for Nature-based Climate Solutions, National University of Singapore, Singapore 117546, Singapore
- Department of Industrial Systems Engineering & Management, National University of Singapore, Singapore 117576, Singapore
| | - Lian Pin Koh
- Department of Biological Sciences, National University of Singapore, Singapore 117558, Singapore
- Centre for Nature-based Climate Solutions, National University of Singapore, Singapore 117546, Singapore
- Tropical Marine Science Institute, National University of Singapore, Singapore 119222, Singapore
| |
Collapse
|
12
|
Kneipp J, Seifert S, Gärber F. SERS microscopy as a tool for comprehensive biochemical characterization in complex samples. Chem Soc Rev 2024; 53:7641-7656. [PMID: 38934892 DOI: 10.1039/d4cs00460d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/28/2024]
Abstract
Surface enhanced Raman scattering (SERS) spectra of biomaterials such as cells or tissues can be used to obtain biochemical information from nanoscopic volumes in these heterogeneous samples. This tutorial review discusses the factors that determine the outcome of a SERS experiment in complex bioorganic samples. They are related to the SERS process itself, the possibility to selectively probe certain regions or constituents of a sample, and the retrieval of the vibrational information in order to identify molecules and their interaction. After introducing basic aspects of SERS experiments in the context of biocompatible environments, spectroscopy in typical microscopic settings is exemplified, including the possibilities to combine SERS with other linear and non-linear microscopic tools, and to exploit approaches that improve lateral and temporal resolution. In particular the great variation of data in a SERS experiment calls for robust data analysis tools. Approaches will be introduced that have been originally developed in the field of bioinformatics for the application to omics data and that show specific potential in the analysis of SERS data. They include the use of simulated data and machine learning tools that can yield chemical information beyond achieving spectral classification.
Collapse
Affiliation(s)
- Janina Kneipp
- Department of Chemistry, Humboldt-Universität zu Berlin, Brook-Taylor-Str. 2, 12489 Berlin, Germany.
| | - Stephan Seifert
- Hamburg School of Food Science, Department of Chemistry, Universität Hamburg, Grindelallee 117, 20146 Hamburg, Germany
| | - Florian Gärber
- Hamburg School of Food Science, Department of Chemistry, Universität Hamburg, Grindelallee 117, 20146 Hamburg, Germany
| |
Collapse
|
13
|
Ducatez F, Tebani A, Abily-Donval L, Snanoudj S, Pilon C, Plichet T, Le Chatelier C, Bekri S, Marret S. New insights and potential biomarkers for intraventricular hemorrhage in extremely premature infant, case-control study. Pediatr Res 2024; 96:395-401. [PMID: 38467704 DOI: 10.1038/s41390-024-03111-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Revised: 02/13/2024] [Accepted: 02/15/2024] [Indexed: 03/13/2024]
Abstract
BACKGROUND Despite advancements in neonatal care, germinal matrix-intraventricular hemorrhage impacts 20% of very preterm infants, exacerbating their neurological prognosis. Understanding its complex, multifactorial pathophysiology and rapid onset remains challenging. This study aims to link specific cord blood biomolecules at birth with post-natal germinal matrix-intraventricular hemorrhage onset. METHODS A monocentric, prospective case-control study was conducted at Rouen University Hospital from 2015 to 2020. Premature newborns ( < 30 gestational age) were included and cord blood was sampled in the delivery room. A retrospective matching procedure was held in 2021 to select samples for proteomic and metabolomic analysis of 370 biomolecules. RESULTS 26 patients with germinal matrix-intraventricular hemorrhage cases and 60 controls were included. Clinical differences were minimal, except for higher invasive ventilation rates in the germinal matrix-intraventricular hemorrhage group. Germinal matrix-intraventricular hemorrhage newborns exhibited lower phosphatidylcholine levels and elevated levels of four proteins: BOC cell adhesion-associated protein, placental growth factor, Leukocyte-associated immunoglobulin-like receptor 2, and tumor necrosis factor-related apoptosis-inducing ligand receptor 2. CONCLUSION This study identifies biomolecules that may be linked to subsequent germinal matrix-intraventricular hemorrhage, suggesting heightened vascular disruption risk as an independent factor. These results need further validation but could serve as early germinal matrix-intraventricular hemorrhage risk biomarkers for future evaluations. IMPACT Decrease in certain phosphatidylcholines and increase in four proteins in cord blood at birth may be linked to subsequent germinal matrix-intraventricular hemorrhage in premature newborns. The four proteins are BOC cell adhesion-associated protein, placental growth factor, leukocyte-associated immunoglobulin-like receptor 2, and TNF-related apoptosis-inducing ligand receptor 2. This biological imprint could point toward higher vascular disruption risk as an independent risk factor for this complication and with further validations, could be used for better stratification of premature newborns at birth.
Collapse
Affiliation(s)
- Franklin Ducatez
- Normandie Univ, UNIROUEN, INSERM U1245, CHU Rouen, Department of Neonatal Pediatrics, Intensive Care, and Neuropediatrics, 76000, Rouen, France
- Normandie Univ, UNIROUEN, INSERM U1245, CHU Rouen, Department of Metabolic Biochemistry, 76000, Rouen, France
| | - Abdellah Tebani
- Normandie Univ, UNIROUEN, INSERM U1245, CHU Rouen, Department of Metabolic Biochemistry, 76000, Rouen, France
| | - Lenaig Abily-Donval
- Normandie Univ, UNIROUEN, INSERM U1245, CHU Rouen, Department of Neonatal Pediatrics, Intensive Care, and Neuropediatrics, 76000, Rouen, France
| | - Sarah Snanoudj
- Normandie Univ, UNIROUEN, INSERM U1245, CHU Rouen, Department of Metabolic Biochemistry, 76000, Rouen, France
| | - Carine Pilon
- CHU Rouen, Department of Metabolic Biochemistry, 76000, Rouen, France
| | - Thomas Plichet
- CHU Rouen, Department of Metabolic Biochemistry, 76000, Rouen, France
| | - Charlotte Le Chatelier
- Normandie Univ, UNIROUEN, INSERM U1245, CHU Rouen, Department of Neonatal Pediatrics, Intensive Care, and Neuropediatrics, 76000, Rouen, France
| | - Soumeya Bekri
- Normandie Univ, UNIROUEN, INSERM U1245, CHU Rouen, Department of Metabolic Biochemistry, 76000, Rouen, France
| | - Stéphane Marret
- Normandie Univ, UNIROUEN, INSERM U1245, CHU Rouen, Department of Neonatal Pediatrics, Intensive Care, and Neuropediatrics, 76000, Rouen, France.
| |
Collapse
|
14
|
Winter PS, Ramseier ML, Navia AW, Saksena S, Strouf H, Senhaji N, DenAdel A, Mirza M, An HH, Bilal L, Dennis P, Leahy CS, Shigemori K, Galves-Reyes J, Zhang Y, Powers F, Mulugeta N, Gupta AJ, Calistri N, Van Scoyk A, Jones K, Liu H, Stevenson KE, Ren S, Luskin MR, Couturier CP, Amini AP, Raghavan S, Kimmerling RJ, Stevens MM, Crawford L, Weinstock DM, Manalis SR, Shalek AK, Murakami MA. Mutation and cell state compatibility is required and targetable in Ph+ acute lymphoblastic leukemia minimal residual disease. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.06.597767. [PMID: 38915726 PMCID: PMC11195125 DOI: 10.1101/2024.06.06.597767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/26/2024]
Abstract
Efforts to cure BCR::ABL1 B cell acute lymphoblastic leukemia (Ph+ ALL) solely through inhibition of ABL1 kinase activity have thus far been insufficient despite the availability of tyrosine kinase inhibitors (TKIs) with broad activity against resistance mutants. The mechanisms that drive persistence within minimal residual disease (MRD) remain poorly understood and therefore untargeted. Utilizing 13 patient-derived xenograft (PDX) models and clinical trial specimens of Ph+ ALL, we examined how genetic and transcriptional features co-evolve to drive progression during prolonged TKI response. Our work reveals a landscape of cooperative mutational and transcriptional escape mechanisms that differ from those causing resistance to first generation TKIs. By analyzing MRD during remission, we show that the same resistance mutation can either increase or decrease cellular fitness depending on transcriptional state. We further demonstrate that directly targeting transcriptional state-associated vulnerabilities at MRD can overcome BCR::ABL1 independence, suggesting a new paradigm for rationally eradicating MRD prior to relapse. Finally, we illustrate how cell mass measurements of leukemia cells can be used to rapidly monitor dominant transcriptional features of Ph+ ALL to help rationally guide therapeutic selection from low-input samples.
Collapse
Affiliation(s)
- Peter S. Winter
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
- Institute for Medical Engineering & Science, MIT, Cambridge, MA, USA
- Department of Chemistry, MIT, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Michelle L. Ramseier
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
- Institute for Medical Engineering & Science, MIT, Cambridge, MA, USA
- Department of Chemistry, MIT, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, USA
| | - Andrew W. Navia
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
- Institute for Medical Engineering & Science, MIT, Cambridge, MA, USA
- Department of Chemistry, MIT, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, USA
| | - Sachit Saksena
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA, USA
- Computational and Systems Biology Program, MIT, Cambridge, MA, USA
| | - Haley Strouf
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
| | - Nezha Senhaji
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Alan DenAdel
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
- Department of Biostatistics, Brown University, Providence, RI, USA
| | - Mahnoor Mirza
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
| | - Hyun Hwan An
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Laura Bilal
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
| | - Peter Dennis
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Catharine S. Leahy
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Kay Shigemori
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Jennyfer Galves-Reyes
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
- Institute for Medical Engineering & Science, MIT, Cambridge, MA, USA
- Department of Chemistry, MIT, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, USA
| | - Ye Zhang
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
- Department of Biological Engineering, MIT, Cambridge, MA, USA
| | - Foster Powers
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Nolawit Mulugeta
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
- Institute for Medical Engineering & Science, MIT, Cambridge, MA, USA
- Department of Chemistry, MIT, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, USA
| | | | - Nicholas Calistri
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
| | - Alex Van Scoyk
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Kristen Jones
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Huiyun Liu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | - Siyang Ren
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA USA
| | - Marlise R. Luskin
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Charles P. Couturier
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
- Institute for Medical Engineering & Science, MIT, Cambridge, MA, USA
- Department of Chemistry, MIT, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, USA
| | | | - Srivatsan Raghavan
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | | - Mark M. Stevens
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI, USA
- Department of Biostatistics, Brown University, Providence, RI, USA
- Microsoft Research, Cambridge, MA, USA
| | - David M. Weinstock
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Current Address: Merck and Co., Rahway, NJ, USA
| | - Scott R. Manalis
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biological Engineering, MIT, Cambridge, MA, USA
| | - Alex K. Shalek
- Koch Institute for Integrative Cancer Research, MIT, Cambridge, MA, USA
- Institute for Medical Engineering & Science, MIT, Cambridge, MA, USA
- Department of Chemistry, MIT, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA, USA
| | - Mark A. Murakami
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| |
Collapse
|
15
|
Yoo HJ, Koo B, Yong CW, Lee KS. Prediction of gait recovery using machine learning algorithms in patients with spinal cord injury. Medicine (Baltimore) 2024; 103:e38286. [PMID: 38847729 PMCID: PMC11155515 DOI: 10.1097/md.0000000000038286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/17/2024] [Accepted: 04/26/2024] [Indexed: 06/10/2024] Open
Abstract
With advances in artificial intelligence, machine learning (ML) has been widely applied to predict functional outcomes in clinical medicine. However, there has been no attempt to predict walking ability after spinal cord injury (SCI) based on ML. In this situation, the main purpose of this study was to predict gait recovery after SCI at discharge from an acute rehabilitation facility using various ML algorithms. In addition, we explored important variables that were related to the prognosis. Finally, we attempted to suggest an ML-based decision support system (DSS) for predicting gait recovery after SCI. Data were collected retrospectively from patients with SCI admitted to an acute rehabilitation facility between June 2008 to December 2021. Linear regression analysis and ML algorithms (random forest [RF], decision tree [DT], and support vector machine) were used to predict the functional ambulation category at the time of discharge (FAC_DC) in patients with traumatic or non-traumatic SCI (n = 353). The independent variables were age, sex, duration of acute care and rehabilitation, comorbidities, neurological information entered into the International Standards for Neurological Classification of SCI worksheet, and somatosensory-evoked potentials at the time of admission to the acute rehabilitation facility. In addition, the importance of variables and DT-based DSS for FAC_DC was analyzed. As a result, RF and DT accurately predicted the FAC_DC measured by the root mean squared error. The root mean squared error of RF and the DT were 1.09 and 1.24 for all participants, 1.20 and 1.06 for those with trauma, and 1.12 and 1.03 for those with non-trauma, respectively. In the analysis of important variables, the initial FAC was found to be the most influential factor in all groups. In addition, we could provide a simple DSS based on strong predictors such as the initial FAC, American Spinal Injury Association Impairment Scale grades, and neurological level of injury. In conclusion, we provide that ML can accurately predict gait recovery after SCI for the first time. By focusing on important variables and DSS, we can guide early prognosis and establish personalized rehabilitation strategies in acute rehabilitation hospitals.
Collapse
Affiliation(s)
- Hyun-Joon Yoo
- Korea University Research Institute for Medical Bigdata Science, Korea University College of Medicine, Seoul, Republic of Korea
| | - Bummo Koo
- School of Health and Environmental Science, Korea University College of Health Science, Seoul, Republic of Korea
| | - Chan-woo Yong
- School of Health and Environmental Science, Korea University College of Health Science, Seoul, Republic of Korea
| | - Kwang-Sig Lee
- AI Center, Korea University Anam Hospital, Korea University College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
16
|
Li C, Meng X. Effective analysis of job satisfaction among medical staff in Chinese public hospitals: a random forest model. Front Public Health 2024; 12:1357709. [PMID: 38699429 PMCID: PMC11063264 DOI: 10.3389/fpubh.2024.1357709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/05/2024] [Indexed: 05/05/2024] Open
Abstract
Objective This study explored the factors and influence degree of job satisfaction among medical staff in Chinese public hospitals by constructing the optimal discriminant model. Methods The participant sample is based on the service volume of 12,405 officially appointed medical staff from different departments of 16 public hospitals for three consecutive years from 2017 to 2019. All medical staff (doctors, nurses, administrative personnel) invited to participate in the survey for the current year will no longer repeat their participation. The importance of all associated factors and the optimal evaluation model has been calculated. Results The overall job satisfaction of medical staff is 25.62%. The most important factors affecting medical staff satisfaction are: Value staff opinions (Q10), Get recognition for your work (Q11), Democracy (Q9), and Performance Evaluation Satisfaction (Q5). The random forest model is the best evaluation model for medical staff satisfaction, and its prediction accuracy is higher than other similar models. Conclusion The improvement of medical staff job satisfaction is significantly related to the improvement of democracy, recognition of work, and increased employee performance. It has shown that improving these five key variables can maximize the job satisfaction and motivation of medical staff. The random forest model can maximize the accuracy and effectiveness of similar research.
Collapse
Affiliation(s)
| | - Xuehui Meng
- Department of Health Service Management, Humanities and Management School, Zhejiang Chinese Medical University, Hangzhou, China
| |
Collapse
|
17
|
Hu J, Szymczak S. Evaluation of network-guided random forest for disease gene discovery. BioData Min 2024; 17:10. [PMID: 38627770 PMCID: PMC11020917 DOI: 10.1186/s13040-024-00361-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 04/09/2024] [Indexed: 04/20/2024] Open
Abstract
BACKGROUND Gene network information is believed to be beneficial for disease module and pathway identification, but has not been explicitly utilized in the standard random forest (RF) algorithm for gene expression data analysis. We investigate the performance of a network-guided RF where the network information is summarized into a sampling probability of predictor variables which is further used in the construction of the RF. RESULTS Our simulation results suggest that network-guided RF does not provide better disease prediction than the standard RF. In terms of disease gene discovery, if disease genes form module(s), network-guided RF identifies them more accurately. In addition, when disease status is independent from genes in the given network, spurious gene selection results can occur when using network information, especially on hub genes. Our empirical analysis on two balanced microarray and RNA-Seq breast cancer datasets from The Cancer Genome Atlas (TCGA) for classification of progesterone receptor (PR) status also demonstrates that network-guided RF can identify genes from PGR-related pathways, which leads to a better connected module of identified genes. CONCLUSIONS Gene networks can provide additional information to aid the gene expression analysis for disease module and pathway identification. But they need to be used with caution and validation on the results need to be carried out to guard against spurious gene selection. More robust approaches to incorporate such information into RF construction also warrant further study.
Collapse
Affiliation(s)
- Jianchang Hu
- Institute of Medical Biometry and Statistics, University of Lübeck, Ratzeburger Allee 160, Lübeck, 23562, Germany
| | - Silke Szymczak
- Institute of Medical Biometry and Statistics, University of Lübeck, Ratzeburger Allee 160, Lübeck, 23562, Germany.
| |
Collapse
|
18
|
Alexander H, Hu SK, Krinos AI, Pachiadaki M, Tully BJ, Neely CJ, Reiter T. Eukaryotic genomes from a global metagenomic data set illuminate trophic modes and biogeography of ocean plankton. mBio 2023; 14:e0167623. [PMID: 37947402 PMCID: PMC10746220 DOI: 10.1128/mbio.01676-23] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 09/27/2023] [Indexed: 11/12/2023] Open
Abstract
IMPORTANCE Single-celled eukaryotes play ecologically significant roles in the marine environment, yet fundamental questions about their biodiversity, ecological function, and interactions remain. Environmental sequencing enables researchers to document naturally occurring protistan communities, without culturing bias, yet metagenomic and metatranscriptomic sequencing approaches cannot separate individual species from communities. To more completely capture the genomic content of mixed protistan populations, we can create bins of sequences that represent the same organism (metagenome-assembled genomes [MAGs]). We developed the EukHeist pipeline, which automates the binning of population-level eukaryotic and prokaryotic genomes from metagenomic reads. We show exciting insight into what protistan communities are present and their trophic roles in the ocean. Scalable computational tools, like EukHeist, may accelerate the identification of meaningful genetic signatures from large data sets and complement researchers' efforts to leverage MAG databases for addressing ecological questions, resolving evolutionary relationships, and discovering potentially novel biodiversity.
Collapse
Affiliation(s)
- Harriet Alexander
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
| | - Sarah K. Hu
- Marine Chemistry and Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
| | - Arianna I. Krinos
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
- MIT-WHOI Joint Program in Oceanography/Applied Ocean Science and Engineering, Cambridge and Woods Hole, Massachusetts, USA
| | - Maria Pachiadaki
- Biology Department, Woods Hole Oceanographic Institution, Woods Hole, Massachusetts, USA
| | - Benjamin J. Tully
- Department of Biological Sciences, University of Southern California, Los Angeles, California, USA
| | - Christopher J. Neely
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
| | - Taylor Reiter
- Population Health and Reproduction, University of California, Davis, Davis, California, USA
| |
Collapse
|
19
|
Tran Quoc V, Nguyen Thi Ngoc D, Nguyen Hoang T, Vu Thi H, Tong Duc M, Do Pham Nguyet T, Nguyen Van T, Ho Ngoc D, Vu Son G, Bui Duc T. Predicting Antibiotic Resistance in ICUs Patients by Applying Machine Learning in Vietnam. Infect Drug Resist 2023; 16:5535-5546. [PMID: 37638070 PMCID: PMC10460201 DOI: 10.2147/idr.s415885] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 08/16/2023] [Indexed: 08/29/2023] Open
Abstract
Introduction Artificial Intelligence (AI) and machine learning (ML) are used extensively in HICs to detect and control antibiotic resistance (AMR) in laboratories and clinical institutions. ML is designed to predict outcome variables using an algorithm to enable "machines" to learn the "rules" from the data. ML is increasingly being applied in intensive care units to identify AMR and to assist empiric antibiotic therapy. This study aimed to evaluate the performance of ML models for predicting AMR bacteria and resistance to antibiotics in two Vietnamese hospitals. Patients and Methods A cross-sectional study combined with retrospective was conducted from 1st January 2020 to 30th June 2022. Five models were developed to predict antibiotic resistance of bacterial infections of ICU patients. Two datasets were prepared to predict AMR bacteria and antibiotics with ML models. The performance of the prediction models was evaluated by various indicators (sensitivity, specificity, precision, accuracy, F1-score, PRC, AuROC, and NormMCC) to determine the optimal time point for data selection. Python version 3.8 was used for statistical analyses. Results The accuracy, F1-score, AuROC, and normMMC of LightGBM, XGBoost, and Random Forest models were higher than those of other models in both datasets. In both datasets 1 and 2, accuracy, F1-score, AuROC and normMCC of the XGBoost model were the highest among five models (from 0.890 to 1.000). Only Random Forest models had specificity scores higher than 0.850. High scores of sensitivity, accuracy, precision, F1-score, and normMCC indicated that the models were making accurate predictions for datasets 1 and 2. Conclusion XGBoost, LightGBM, and Random Forest were the best-performed machine learning models to predict antibiotic resistance of bacterial infections of ICUs patients using the patients' EMRs.
Collapse
Affiliation(s)
- Viet Tran Quoc
- Intensive Care Unit, Military Hospital 175, Ho Chi Minh City, Vietnam
| | - Dung Nguyen Thi Ngoc
- Department of Military Science and Training, Military Hospital 175, Ho Chi Minh City, Vietnam
- Hanoi University of Public Health, Hanoi, Vietnam
| | - Trung Nguyen Hoang
- Department of Military Hygiene, Vietnam Military Medical University, Hanoi, Vietnam
| | - Hoa Vu Thi
- Department of Military Hygiene, Vietnam Military Medical University, Hanoi, Vietnam
| | - Minh Tong Duc
- Department of Military Hygiene, Vietnam Military Medical University, Hanoi, Vietnam
| | - Thanh Do Pham Nguyet
- Department of Military Science and Training, Military Hospital 175, Ho Chi Minh City, Vietnam
| | - Thanh Nguyen Van
- Department of General Planning, Military Hospital 175, Ho Chi Minh City, Vietnam
| | - Diep Ho Ngoc
- Department of Military Science and Training, Military Hospital 175, Ho Chi Minh City, Vietnam
| | - Giang Vu Son
- Department of Personnel, Military Hospital 175, Ho Chi Minh City, Vietnam
| | - Thanh Bui Duc
- Institute of Trauma and Orthopedics, Military hospital 175, Ho Chi Minh City, Vietnam
| |
Collapse
|
20
|
Voges LF, Jarren LC, Seifert S. Exploitation of surrogate variables in random forests for unbiased analysis of mutual impact and importance of features. Bioinformatics 2023; 39:btad471. [PMID: 37522865 PMCID: PMC10403431 DOI: 10.1093/bioinformatics/btad471] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Revised: 07/06/2023] [Accepted: 07/28/2023] [Indexed: 08/01/2023] Open
Abstract
MOTIVATION Random forest is a popular machine learning approach for the analysis of high-dimensional data because it is flexible and provides variable importance measures for the selection of relevant features. However, the complex relationships between the features are usually not considered for the selection and thus also neglected for the characterization of the analysed samples. RESULTS Here we propose two novel approaches that focus on the mutual impact of features in random forests. Mutual forest impact (MFI) is a relation parameter that evaluates the mutual association of the features to the outcome and, hence, goes beyond the analysis of correlation coefficients. Mutual impurity reduction (MIR) is an importance measure that combines this relation parameter with the importance of the individual features. MIR and MFI are implemented together with testing procedures that generate P-values for the selection of related and important features. Applications to one experimental and various simulated datasets and the comparison to other methods for feature selection and relation analysis show that MFI and MIR are very promising to shed light on the complex relationships between features and outcome. In addition, they are not affected by common biases, e.g. that features with many possible splits or high minor allele frequencies are preferred. AVAILABILITY AND IMPLEMENTATION The approaches are implemented in Version 0.3.3 of the R package RFSurrogates that is available at github.com/AGSeifert/RFSurrogates and the data are available at doi.org/10.25592/uhhfdm.12620.
Collapse
Affiliation(s)
- Lucas F Voges
- Centre for the Study of Manuscript Cultures (CSMC), Universität Hamburg, Hamburg 20354, Germany
| | - Lukas C Jarren
- Centre for the Study of Manuscript Cultures (CSMC), Universität Hamburg, Hamburg 20354, Germany
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Hamburg 20146, Germany
| | - Stephan Seifert
- Centre for the Study of Manuscript Cultures (CSMC), Universität Hamburg, Hamburg 20354, Germany
- Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Hamburg 20146, Germany
| |
Collapse
|
21
|
Bambha K, Kim NJ, Sturdevant M, Perkins JD, Kling C, Bakthavatsalam R, Healey P, Dick A, Reyes JD, Biggins SW. Maximizing utility of nondirected living liver donor grafts using machine learning. Front Immunol 2023; 14:1194338. [PMID: 37457719 PMCID: PMC10344453 DOI: 10.3389/fimmu.2023.1194338] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 06/13/2023] [Indexed: 07/18/2023] Open
Abstract
Objective There is an unmet need for optimizing hepatic allograft allocation from nondirected living liver donors (ND-LLD). Materials and method Using OPTN living donor liver transplant (LDLT) data (1/1/2000-12/31/2019), we identified 6328 LDLTs (4621 right, 644 left, 1063 left-lateral grafts). Random forest survival models were constructed to predict 10-year graft survival for each of the 3 graft types. Results Donor-to-recipient body surface area ratio was an important predictor in all 3 models. Other predictors in all 3 models were: malignant diagnosis, medical location at LDLT (inpatient/ICU), and moderate ascites. Biliary atresia was important in left and left-lateral graft models. Re-transplant was important in right graft models. C-index for 10-year graft survival predictions for the 3 models were: 0.70 (left-lateral); 0.63 (left); 0.61 (right). Similar C-indices were found for 1-, 3-, and 5-year graft survivals. Comparison of model predictions to actual 10-year graft survivals demonstrated that the predicted upper quartile survival group in each model had significantly better actual 10-year graft survival compared to the lower quartiles (p<0.005). Conclusion When applied in clinical context, our models assist with the identification and stratification of potential recipients for hepatic grafts from ND-LLD based on predicted graft survivals, while accounting for complex donor-recipient interactions. These analyses highlight the unmet need for granular data collection and machine learning modeling to identify potential recipients who have the best predicted transplant outcomes with ND-LLD grafts.
Collapse
Affiliation(s)
- Kiran Bambha
- Division of Gastroenterology and Hepatology, Department of Medicine, University of Washington, Seattle, WA, United States
- Center for Liver Investigation Fostering discovery (C-LIFE), University of Washington, Seattle, WA, United States
- Clinical and Bio-Analytics Transplant Laboratory (C-BATL), University of Washington, Seattle, WA, United States
| | - Nicole J. Kim
- Division of Gastroenterology and Hepatology, Department of Medicine, University of Washington, Seattle, WA, United States
- Center for Liver Investigation Fostering discovery (C-LIFE), University of Washington, Seattle, WA, United States
| | - Mark Sturdevant
- Clinical and Bio-Analytics Transplant Laboratory (C-BATL), University of Washington, Seattle, WA, United States
- Division of Transplant Surgery, Department of Surgery, University of Washington, Seattle, WA, United States
| | - James D. Perkins
- Clinical and Bio-Analytics Transplant Laboratory (C-BATL), University of Washington, Seattle, WA, United States
- Division of Transplant Surgery, Department of Surgery, University of Washington, Seattle, WA, United States
| | - Catherine Kling
- Clinical and Bio-Analytics Transplant Laboratory (C-BATL), University of Washington, Seattle, WA, United States
- Division of Transplant Surgery, Department of Surgery, University of Washington, Seattle, WA, United States
| | - Ramasamy Bakthavatsalam
- Clinical and Bio-Analytics Transplant Laboratory (C-BATL), University of Washington, Seattle, WA, United States
- Division of Transplant Surgery, Department of Surgery, University of Washington, Seattle, WA, United States
| | - Patrick Healey
- Clinical and Bio-Analytics Transplant Laboratory (C-BATL), University of Washington, Seattle, WA, United States
- Pediatric Transplant Surgery Division, Department of Surgery, Seattle Children’s Hospital, Seattle, WA, United States
| | - Andre Dick
- Clinical and Bio-Analytics Transplant Laboratory (C-BATL), University of Washington, Seattle, WA, United States
- Pediatric Transplant Surgery Division, Department of Surgery, Seattle Children’s Hospital, Seattle, WA, United States
| | - Jorge D. Reyes
- Clinical and Bio-Analytics Transplant Laboratory (C-BATL), University of Washington, Seattle, WA, United States
- Division of Transplant Surgery, Department of Surgery, University of Washington, Seattle, WA, United States
- Pediatric Transplant Surgery Division, Department of Surgery, Seattle Children’s Hospital, Seattle, WA, United States
| | - Scott W. Biggins
- Division of Gastroenterology and Hepatology, Department of Medicine, University of Washington, Seattle, WA, United States
- Center for Liver Investigation Fostering discovery (C-LIFE), University of Washington, Seattle, WA, United States
- Clinical and Bio-Analytics Transplant Laboratory (C-BATL), University of Washington, Seattle, WA, United States
| |
Collapse
|
22
|
Kafka JM, Fliss MD, Trangenstein PJ, McNaughton Reyes L, Pence BW, Moracco KE. Detecting intimate partner violence circumstance for suicide: development and validation of a tool using natural language processing and supervised machine learning in the National Violent Death Reporting System. Inj Prev 2023; 29:134-141. [PMID: 36600568 DOI: 10.1136/ip-2022-044662] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 10/19/2022] [Indexed: 12/12/2022]
Abstract
BACKGROUND Intimate partner violence (IPV) victims and perpetrators often report suicidal ideation, yet there is no comprehensive national dataset that allows for an assessment of the connection between IPV and suicide. The National Violent Death Reporting System (NVDRS) captures IPV circumstances for homicide-suicides (<2% of suicides), but not single suicides (suicide unconnected to other violent deaths; >98% of suicides). OBJECTIVE To facilitate a more comprehensive understanding of the co-occurrence of IPV and suicide, we developed and validated a tool that detects mentions of IPV circumstances (yes/no) for single suicides in NVDRS death narratives. METHODS We used 10 000 hand-labelled single suicide cases from NVDRS (2010-2018) to train (n=8500) and validate (n=1500) a classification model using supervised machine learning. We used natural language processing to extract relevant information from the death narratives within a concept normalisation framework. We tested numerous models and present performance metrics for the best approach. RESULTS Our final model had robust sensitivity (0.70), specificity (0.98), precision (0.72) and kappa values (0.69). False positives mostly described other family violence. False negatives used vague and heterogeneous language to describe IPV, and often included abusive suicide threats. IMPLICATIONS It is possible to detect IPV circumstances among singles suicides in NVDRS, although vague language in death narratives limited our tool's sensitivity. More attention to the role of IPV in suicide is merited both during the initial death investigation processes and subsequent NVDRS reporting. This tool can support future research to inform targeted prevention.
Collapse
Affiliation(s)
- Julie M Kafka
- Health Behavior, University of North Carolina Gillings School of Global Public Health, Chapel Hill, North Carolina, USA
- The University of North Carolina Injury Prevention Research Center, Chapel Hill, North Carolina, USA
- Firearm Injury & Policy Research Program, The University of Washington, Seattle, WA, USA
| | - Mike D Fliss
- The University of North Carolina Injury Prevention Research Center, Chapel Hill, North Carolina, USA
- Epidemiology, The University of North Carolina Gillings School of Global Public Health, Chapel Hill, North Carolina, USA
| | - Pamela J Trangenstein
- Health Behavior, University of North Carolina Gillings School of Global Public Health, Chapel Hill, North Carolina, USA
- Alcohol Research Group, Emeryville, California, USA
| | - Luz McNaughton Reyes
- Health Behavior, University of North Carolina Gillings School of Global Public Health, Chapel Hill, North Carolina, USA
- The University of North Carolina Injury Prevention Research Center, Chapel Hill, North Carolina, USA
| | - Brian W Pence
- The University of North Carolina Injury Prevention Research Center, Chapel Hill, North Carolina, USA
- Epidemiology, The University of North Carolina Gillings School of Global Public Health, Chapel Hill, North Carolina, USA
| | - Kathryn E Moracco
- Health Behavior, University of North Carolina Gillings School of Global Public Health, Chapel Hill, North Carolina, USA
- The University of North Carolina Injury Prevention Research Center, Chapel Hill, North Carolina, USA
| |
Collapse
|
23
|
Bazzi ZA, Sneddon S, Zhang PGY, Tai IT. Characterization of the immune cell landscape in CRC: Clinical implications of tumour-infiltrating leukocytes in early- and late-stage CRC. Front Immunol 2023; 13:978862. [PMID: 36846019 PMCID: PMC9945970 DOI: 10.3389/fimmu.2022.978862] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Accepted: 12/20/2022] [Indexed: 02/10/2023] Open
Abstract
Introduction Colorectal cancer (CRC) is the third leading cause of cancer-related deaths globally. Tumour-infiltrating leukocytes play an important role in cancers, including CRC. We therefore sought to characterize the impact of tumour-infiltrating leukocytes on CRC prognosis. Methods To determine whether the immune cell profile within CRC tissue could influence prognosis, we employed three computational methodologies (CIBERSORT, xCell and MCPcounter) to predict abundance of immune cell types, based on gene expression. This was done using two patient cohorts, TCGA and BC Cancer Personalized OncoGenomics (POG). Results We observed significant differences in immune cell composition between CRC and normal adjacent colon tissue, as well as differences in based on method of analysis. Evaluation of survival based on immune cell types revealed dendritic cells as a positive prognostic marker, consistently across methodologies. Mast cells were also found to be a positive prognostic marker, but in a stage-dependent manner. Unsupervised cluster analysis demonstrated that significant differences in immune cell composition has a more pronounced effect on prognosis in early-stage CRC, compared to late-stage CRC. This analysis revealed a distinct group of individuals with early-stage CRC which have an immune infiltration signature that indicates better survival probability. Conclusions Taken together, characterization of the immune landscape in CRC has provided a powerful tool to assess prognosis. We anticipate that further characterization of the immune landscape will facilitate use of immunotherapies in CRC.
Collapse
Affiliation(s)
- Zainab Ali Bazzi
- Division of Gastroenterology, Department of Medicine, University of British Columbia, Vancouver, BC, Canada
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Sophie Sneddon
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Peter G Y Zhang
- Division of Gastroenterology, Department of Medicine, University of British Columbia, Vancouver, BC, Canada
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| | - Isabella T Tai
- Division of Gastroenterology, Department of Medicine, University of British Columbia, Vancouver, BC, Canada
- Canada's Michael Smith Genome Sciences Centre, BC Cancer, Vancouver, BC, Canada
| |
Collapse
|
24
|
Hapfelmeier A, Hornung R, Haller B. Efficient permutation testing of variable importance measures by the example of random forests. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2022.107689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
25
|
Random Forests in Count Data Modelling: An Analysis of the Influence of Data Features and Overdispersion on Regression Performance. JOURNAL OF PROBABILITY AND STATISTICS 2022. [DOI: 10.1155/2022/2833537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Machine learning algorithms, especially random forests (RFs), have become an integrated part of the modern scientific methodology and represent an efficient alternative to conventional parametric algorithms. This study aimed to assess the influence of data features and overdispersion on RF regression performance. We assessed the effect of types of predictors (100, 75, 50, and 20% continuous, and 100% categorical), the number of predictors (p = 816 and 24), and the sample size (N = 50, 250, and 1250) on RF parameter settings. We also compared RF performance to that of classical generalized linear models (Poisson, negative binomial, and zero-inflated Poisson) and the linear model applied to log-transformed data. Two real datasets were analysed to demonstrate the usefulness of RF for overdispersed data modelling. Goodness-of-fit statistics such as root mean square error (RMSE) and biases were used to determine RF accuracy and validity. Results revealed that the number of variables to be randomly selected for each split, the proportion of samples to train the model, the minimal number of samples within each terminal node, and RF regression performance are not influenced by the sample size, number, and type of predictors. However, the ratio of observations to the number of predictors affects the stability of the best RF parameters. RF performs well for all types of covariates and different levels of dispersion. The magnitude of dispersion does not significantly influence RF predictive validity. In contrast, its predictive accuracy is significantly influenced by the magnitude of dispersion in the response variable, conditional on the explanatory variables. RF has performed almost as well as the models of the classical Poisson family in the presence of overdispersion. Given RF’s advantages, it is an appropriate statistical alternative for counting data.
Collapse
|
26
|
Gilman E, Chaloupka M, Benaka LR, Bowlby H, Fitchett M, Kaiser M, Musyl M. Phylogeny explains capture mortality of sharks and rays in pelagic longline fisheries: a global meta-analytic synthesis. Sci Rep 2022; 12:18164. [PMID: 36307432 PMCID: PMC9616952 DOI: 10.1038/s41598-022-21976-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Accepted: 10/07/2022] [Indexed: 12/31/2022] Open
Abstract
Apex and mesopredators such as elasmobranchs are important for maintaining ocean health and are the focus of conservation efforts to mitigate exposure to fishing and other anthropogenic hazards. Quantifying fishing mortality components such as at-vessel mortality (AVM) is necessary for effective bycatch management. We assembled a database for 61 elasmobranch species and conducted a global meta-synthesis to estimate pelagic longline AVM rates. Evolutionary history was a significant predictor of AVM, accounting for up to 13% of variance in Bayesian phylogenetic meta-regression models for Lamniformes and Carcharhiniformes clades. Phylogenetically related species may have a high degree of shared traits that explain AVM. Model-estimated posterior mean AVM rates ranged from 5% (95% HDI 0.1%-16%) for pelagic stingrays and 76% (95% HDI 49%-90%) for salmon sharks. Measures that reduce catch, and hence AVM levels, such as input controls, bycatch quotas and gear technology to increase selectivity are appropriate for species with higher AVM rates. In addition to reducing catchability, handling-and-release practices and interventions such as retention bans in shark sanctuaries and bans on shark finning and trade hold promise for species with lower AVM rates. Robust, and where applicable, phylogenetically-adjusted elasmobranch AVM rates are essential for evidence-informed bycatch policy.
Collapse
Affiliation(s)
- Eric Gilman
- The Safina Center, Honolulu, USA.
- The Lyell Centre, Heriot-Watt University, Edinburgh, UK.
| | - Milani Chaloupka
- Ecological Modelling Services Pty Ltd and Marine Spatial Ecology Lab, University of Queensland, Brisbane, Australia
| | - Lee R Benaka
- Office of Science and Technology, U.S. NOAA Fisheries, Silver Spring, USA
| | - Heather Bowlby
- Bedford Institute of Oceanography, Fisheries and Oceans, Dartmouth, Canada
| | - Mark Fitchett
- Western Pacific Regional Fishery Management Council, Honolulu, USA
| | - Michel Kaiser
- The Lyell Centre, Heriot-Watt University, Edinburgh, UK
| | | |
Collapse
|
27
|
Comparison of tree-based ensemble models for regression. COMMUNICATIONS FOR STATISTICAL APPLICATIONS AND METHODS 2022. [DOI: 10.29220/csam.2022.29.5.561] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
28
|
Optimized Metabotype Definition Based on a Limited Number of Standard Clinical Parameters in the Population-Based KORA Study. Life (Basel) 2022; 12:life12101460. [PMID: 36294895 PMCID: PMC9604647 DOI: 10.3390/life12101460] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Revised: 09/07/2022] [Accepted: 09/16/2022] [Indexed: 11/23/2022] Open
Abstract
The aim of metabotyping is to categorize individuals into metabolically similar groups. Earlier studies that explored metabotyping used numerous parameters, which made it less transferable to apply. Therefore, this study aimed to identify metabotypes based on a set of standard laboratory parameters that are regularly determined in clinical practice. K-means cluster analysis was used to group 3001 adults from the KORA F4 cohort into three clusters. We identified the clustering parameters through variable importance methods, without including any specific disease endpoint. Several unique combinations of selected parameters were used to create different metabotype models. Metabotype models were then described and evaluated, based on various metabolic parameters and on the incidence of cardiometabolic diseases. As a result, two optimal models were identified: a model composed of five parameters, which were fasting glucose, HDLc, non-HDLc, uric acid, and BMI (the metabolic disease model) for clustering; and a model that included four parameters, which were fasting glucose, HDLc, non-HDLc, and triglycerides (the cardiovascular disease model). These identified metabotypes are based on a few common parameters that are measured in everyday clinical practice. These metabotypes are cost-effective, and can be easily applied on a large scale in order to identify specific risk groups that can benefit most from measures to prevent cardiometabolic diseases, such as dietary recommendations and lifestyle interventions.
Collapse
|
29
|
Safi Z, Venugopal N, Ali H, Makhlouf M, Farooq F, Boughorbel S. Analysis of risk factors progression of preterm delivery using electronic health records. BioData Min 2022; 15:17. [PMID: 35978434 PMCID: PMC9386949 DOI: 10.1186/s13040-022-00298-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 06/21/2022] [Indexed: 11/24/2022] Open
Abstract
Background Preterm deliveries have many negative health implications on both mother and child. Identifying the population level factors that increase the risk of preterm deliveries is an important step in the direction of mitigating the impact and reducing the frequency of occurrence of preterm deliveries. The purpose of this work is to identify preterm delivery risk factors and their progression throughout the pregnancy from a large collection of Electronic Health Records (EHR). Results The study cohort includes about 60,000 deliveries in the USA with the complete medical history from EHR for diagnoses, medications and procedures. We propose a temporal analysis of risk factors by estimating and comparing risk ratios and variable importance at different time points prior to the delivery event. We selected the following time points before delivery: 0, 12 and 24 week(s) of gestation. We did so by conducting a retrospective cohort study of patient history for a selected set of mothers who delivered preterm and a control group of mothers that delivered full-term. We analyzed the extracted data using logistic regression and random forests models. The results of our analyses showed that the highest risk ratio and variable importance corresponds to history of previous preterm delivery. Other risk factors were identified, some of which are consistent with those that are reported in the literature, others need further investigation. Conclusions The comparative analysis of the risk factors at different time points showed that risk factors in the early pregnancy related to patient history and chronic condition, while the risk factors in late pregnancy are specific to the current pregnancy. Our analysis unifies several previously reported studies on preterm risk factors. It also gives important insights on the changes of risk factors in the course of pregnancy. The code used for data analysis will be made available on github.
Collapse
Affiliation(s)
- Zeineb Safi
- Research Department, Sidra Medicine, Doha, Qatar
| | | | - Haytham Ali
- Division of Neonatalogy, Sidra Medicine, Doha, Qatar
| | - Michel Makhlouf
- Department of Maternal-Fetal Medicine, Sidra Medicine, Doha, Qatar
| | - Faisal Farooq
- Qatar Computing Research Institute, HBKU, Doha, Qatar
| | | |
Collapse
|
30
|
Luo S, Jiang X, He Y, Li J, Jiao W, Zhang S, Xu F, Han Z, Sun J, Yang J, Wang X, Ma X, Lin Z. Multi-dimensional variables and feature parameter selection for aboveground biomass estimation of potato based on UAV multispectral imagery. FRONTIERS IN PLANT SCIENCE 2022; 13:948249. [PMID: 35968116 PMCID: PMC9372391 DOI: 10.3389/fpls.2022.948249] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 07/07/2022] [Indexed: 06/15/2023]
Abstract
Aboveground biomass (AGB) is an essential assessment of plant development and guiding agricultural production management in the field. Therefore, efficient and accurate access to crop AGB information can provide a timely and precise yield estimation, which is strong evidence for securing food supply and trade. In this study, the spectral, texture, geometric, and frequency-domain variables were extracted through multispectral imagery of drones, and each variable importance for different dimensional parameter combinations was computed by three feature parameter selection methods. The selected variables from the different combinations were used to perform potato AGB estimation. The results showed that compared with no feature parameter selection, the accuracy and robustness of the AGB prediction models were significantly improved after parameter selection. The random forest based on out-of-bag (RF-OOB) method was proved to be the most effective feature selection method, and in combination with RF regression, the coefficient of determination (R2) of the AGB validation model could reach 0.90, with root mean square error (RMSE), mean absolute error (MAE), and normalized RMSE (nRMSE) of 71.68 g/m2, 51.27 g/m2, and 11.56%, respectively. Meanwhile, the regression models of the RF-OOB method provided a good solution to the problem that high AGB values were underestimated with the variables of four dimensions. Moreover, the precision of AGB estimates was improved as the dimensionality of parameters increased. This present work can contribute to a rapid, efficient, and non-destructive means of obtaining AGB information for crops as well as provide technical support for high-throughput plant phenotypes screening.
Collapse
Affiliation(s)
- Shanjun Luo
- Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, China
- Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information Science & Technology, Nanjing, China
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, China
| | - Xueqin Jiang
- School of Remote Sensing and Information Engineering, Wuhan University, Wuhan, China
| | - Yingbin He
- Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, China
- Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters, Nanjing University of Information Science & Technology, Nanjing, China
| | - Jianping Li
- Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Weihua Jiao
- Center for Agricultural and Rural Economic Research, Shandong University of Finance and Economics, Jinan, China
| | - Shengli Zhang
- Potato Science Institute, Jilin Academy of Vegetables and Flower Sciences, Changchun, China
| | - Fei Xu
- Potato Science Institute, Jilin Academy of Vegetables and Flower Sciences, Changchun, China
| | - Zhongcai Han
- Potato Science Institute, Jilin Academy of Vegetables and Flower Sciences, Changchun, China
| | - Jing Sun
- Potato Science Institute, Jilin Academy of Vegetables and Flower Sciences, Changchun, China
| | - Jinpeng Yang
- Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xiangyi Wang
- Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Xintian Ma
- Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing, China
| | - Zeru Lin
- School of Economics and Management, Tiangong University, Tianjin, China
| |
Collapse
|
31
|
Hornung R, Boulesteix AL. Interaction forests: Identifying and exploiting interpretable quantitative and qualitative interaction effects. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
32
|
Loef B, Wong A, Janssen NAH, Strak M, Hoekstra J, Picavet HSJ, Boshuizen HCH, Verschuren WMM, Herber GCM. Using random forest to identify longitudinal predictors of health in a 30-year cohort study. Sci Rep 2022; 12:10372. [PMID: 35725920 PMCID: PMC9209521 DOI: 10.1038/s41598-022-14632-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 06/09/2022] [Indexed: 11/09/2022] Open
Abstract
Due to the wealth of exposome data from longitudinal cohort studies that is currently available, the need for methods to adequately analyze these data is growing. We propose an approach in which machine learning is used to identify longitudinal exposome-related predictors of health, and illustrate its potential through an application. Our application involves studying the relation between exposome and self-perceived health based on the 30-year running Doetinchem Cohort Study. Random Forest (RF) was used to identify the strongest predictors due to its favorable prediction performance in prior research. The relation between predictors and outcome was visualized with partial dependence and accumulated local effects plots. To facilitate interpretation, exposures were summarized by expressing them as the average exposure and average trend over time. The RF model's ability to discriminate poor from good self-perceived health was acceptable (Area-Under-the-Curve = 0.707). Nine exposures from different exposome-related domains were largely responsible for the model's performance, while 87 exposures seemed to contribute little to the performance. Our approach demonstrates that ML can be interpreted more than widely believed, and can be applied to identify important longitudinal predictors of health over the life course in studies with repeated measures of exposure. The approach is context-independent and broadly applicable.
Collapse
Affiliation(s)
- Bette Loef
- Center for Nutrition, Prevention and Health Services, National Institute for Public Health and the Environment, P.O. Box 1, 3720 BA, Bilthoven, The Netherlands.
| | - Albert Wong
- Center for Nutrition, Prevention and Health Services, National Institute for Public Health and the Environment, P.O. Box 1, 3720 BA, Bilthoven, The Netherlands
| | - Nicole A H Janssen
- Center for Nutrition, Prevention and Health Services, National Institute for Public Health and the Environment, P.O. Box 1, 3720 BA, Bilthoven, The Netherlands
| | - Maciek Strak
- Center for Nutrition, Prevention and Health Services, National Institute for Public Health and the Environment, P.O. Box 1, 3720 BA, Bilthoven, The Netherlands
| | - Jurriaan Hoekstra
- Center for Nutrition, Prevention and Health Services, National Institute for Public Health and the Environment, P.O. Box 1, 3720 BA, Bilthoven, The Netherlands
| | - H Susan J Picavet
- Center for Nutrition, Prevention and Health Services, National Institute for Public Health and the Environment, P.O. Box 1, 3720 BA, Bilthoven, The Netherlands
| | - H C Hendriek Boshuizen
- Center for Nutrition, Prevention and Health Services, National Institute for Public Health and the Environment, P.O. Box 1, 3720 BA, Bilthoven, The Netherlands
- Wageningen University and Research, Wageningen, The Netherlands
| | - W M Monique Verschuren
- Center for Nutrition, Prevention and Health Services, National Institute for Public Health and the Environment, P.O. Box 1, 3720 BA, Bilthoven, The Netherlands
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Gerrie-Cor M Herber
- Center for Nutrition, Prevention and Health Services, National Institute for Public Health and the Environment, P.O. Box 1, 3720 BA, Bilthoven, The Netherlands
| |
Collapse
|
33
|
Oishi K, Soldan A, Pettigrew C, Hsu J, Mori S, Albert M, Oishi K. Changes in pairwise functional connectivity associated with changes in cognitive performance in cognitively normal older individuals: A two-year observational study. Neurosci Lett 2022; 781:136618. [PMID: 35398188 PMCID: PMC9990522 DOI: 10.1016/j.neulet.2022.136618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 03/15/2022] [Accepted: 04/03/2022] [Indexed: 10/18/2022]
Abstract
Neurobiological substrates of cognitive decline in cognitively normal older individuals have been investigated by resting-state functional magnetic resonance imaging, but little is known about the relationship between longitudinal changes in the whole brain. In this study, we examined two-year changes in functional connectivity among 80 gray matter areas and investigated the relationship to two-year changes in cognitive performance. A cross-validated permutation variable importance measure was applied to select features related to a change in cognitive performance. Age-corrected changes in eleven pairs of functional connections were selected as important features, all related to brain areas that belong to the default mode network. A linear regression model with cross-validation demonstrated a mean correlation coefficient of 0.55 between measured and predicted changes in the cognitive composite score. These results suggest that intra- and inter-network connections in the default mode network are associated with cognitive changes over two years among cognitively normal individuals.
Collapse
Affiliation(s)
- Kumiko Oishi
- Center for Imaging Science, The Johns Hopkins University, Whiting School of Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Anja Soldan
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Corinne Pettigrew
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Johnny Hsu
- Department of Radiology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Susumu Mori
- Department of Radiology, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Marilyn Albert
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kenichi Oishi
- Department of Radiology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
| |
Collapse
|
34
|
Lösel H, Shakiba N, Wenck S, Le Tan P, Arndt M, Seifert S, Hackl T, Fischer M. Impact of Freeze-Drying on the Determination of the Geographical Origin of Almonds (Prunus dulcis Mill.) by Near-Infrared (NIR) Spectroscopy. FOOD ANAL METHOD 2022. [DOI: 10.1007/s12161-022-02329-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
Abstract
AbstractNear-infrared (NIR) spectroscopy is a proven tool for the determination of food authenticity, mainly because of good classification results and the possibility of industrial use due to its easy and fast application. Since water shows broad absorption bands, the water content of a sample should be as low as possible. Freeze-drying is a commonly used preparatory step for this to reduce the water content in the sample. However, freeze-drying, also known as lyophilization, is very time-consuming impeding the widespread usage of NIR analysis as a rapid method for incoming goods inspections. We used a sample set of 72 almond samples from six economically relevant almond-producing countries to investigate the question of how important lyophilization is to obtain a well-performing classification model. For this approach, the samples were ground and lyophilized for 3 h, 24 h, and 48 h and compared to non-freeze-dried samples. Karl-Fischer titration of non-lyophilized samples showed that water contents ranged from 3.0 to 10.5% and remained constant at 0.36 ± 0.13% after a freeze-drying period of 24 h. The non-freeze-dried samples showed a classification accuracy of 93.9 ± 6.4%, which was in the same range as the samples which were freeze-dried for 3 h (94.2 ± 7.8%), 24 h (92.5 ± 8.7%), and 48 h (95.0 ± 9.0%). Feature selection was performed using the Boruta algorithm, which showed that signals from lipids and proteins are relevant for the origin determination. The presented study showed that samples with low water content, especially nuts, can be analyzed without the time-consuming preparation step of freeze-drying to obtain robust and fast results, which are especially required for incoming goods inspection.
Collapse
|
35
|
Gholamnezhad P, Broumandnia A, Seydi V. An inverse model‐based multiobjective estimation of distribution algorithm using Random‐Forest variable importance methods. Comput Intell 2022. [DOI: 10.1111/coin.12315] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
| | - Ali Broumandnia
- Department of Computer South Tehran Branch, Islamic Azad University Tehran Iran
| | - Vahid Seydi
- Department of Computer South Tehran Branch, Islamic Azad University Tehran Iran
| |
Collapse
|
36
|
Thums M, C. Ferreira L, Jenner C, Jenner M, Harris D, Davenport A, Andrews-Goff V, Double M, Möller L, Attard CR, Bilgmann K, G. Thomson P, McCauley R. Pygmy blue whale movement, distribution and important areas in the Eastern Indian Ocean. Glob Ecol Conserv 2022. [DOI: 10.1016/j.gecco.2022.e02054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
|
37
|
Oishi K, Soldan A, Pettigrew C, Hsu J, Mori S, Albert M, Oishi K, The BIOCARD Research Team. Dataset of relationship between longitudinal change in cognitive performance and functional connectivity in cognitively normal older individuals. Data Brief 2022; 42:108302. [PMID: 35669007 PMCID: PMC9163691 DOI: 10.1016/j.dib.2022.108302] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 05/05/2022] [Accepted: 05/17/2022] [Indexed: 11/30/2022] Open
Abstract
The data show an association between measured and predicted changes in cognitive performance in older adults who are cognitively normal. Changes in cognitive performance over two years were assessed using the Cognitive Composite Score. The prediction of change in cognitive function was based on changes in pairwise functional connectivity between 80 gray matter regions examined by resting-state functional magnetic resonance imaging. A feature extraction process based on the Variable Importance Testing Approach (VITA) identified changes in 11 pairs of functional connections associated with the default mode network as features related to changes in cognitive performance. Linear and elastic net regression models were applied to these 11 features to predict changes in cognitive performance over two years. A relationship between the 11 features and the geriatric depression score was also shown. The dataset supplements the research findings in the "Changes in pairwise functional connectivity associated with changes in cognitive performance in cognitively normal older individuals: a two-year observational study" published in Oishi et al. (2022). The raw rs-fMRI correlation matrix and associated clinical data can be accessed upon request from the BIOCARD website (www.biocard-se.org) and can be reused for predictive model building.
Collapse
Affiliation(s)
- Kumiko Oishi
- Center for Imaging Science, Whiting School of Engineering, The Johns Hopkins University, Baltimore, MD, USA
| | - Anja Soldan
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Corinne Pettigrew
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Johnny Hsu
- The Russell H. Morgan Department of Radiology and Radiological Science, The Johns Hopkins University School of Medicine, 208 Traylor Building, 720 Rutland Avenue, Baltimore, MD 21205, USA
| | - Susumu Mori
- The Russell H. Morgan Department of Radiology and Radiological Science, The Johns Hopkins University School of Medicine, 208 Traylor Building, 720 Rutland Avenue, Baltimore, MD 21205, USA
| | - Marilyn Albert
- Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Kenichi Oishi
- The Russell H. Morgan Department of Radiology and Radiological Science, The Johns Hopkins University School of Medicine, 208 Traylor Building, 720 Rutland Avenue, Baltimore, MD 21205, USA
- Corresponding author.
| | | |
Collapse
|
38
|
Provable Boolean interaction recovery from tree ensemble obtained via random forests. Proc Natl Acad Sci U S A 2022; 119:e2118636119. [PMID: 35609192 PMCID: PMC9295780 DOI: 10.1073/pnas.2118636119] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
SignificanceRandom Forests (RFs) are among the most successful machine-learning algorithms in terms of prediction accuracy. In many domain problems, however, the primary goal is not prediction, but to understand the data-generation process-in particular, finding important features and feature interactions. There exists strong empirical evidence that RF-based methods-in particular, iterative RF (iRF)-are very successful in terms of detecting feature interactions. In this work, we propose a biologically motivated, Boolean interaction model. Using this model, we complement the existing empirical evidence with theoretical evidence for the ability of iRF-type methods to select desirable interactions. Our theoretical analysis also yields deeper insights into the general interaction selection mechanism of decision-tree algorithms and the importance of feature subsampling.
Collapse
|
39
|
Khataeipour SJ, Anaraki JR, Bozorgi A, Rayner M, A Basset F, Fuller D. Predicting lying, sitting and walking at different intensities using smartphone accelerometers at three different wear locations: hands, pant pockets, backpack. BMJ Open Sport Exerc Med 2022; 8:e001242. [PMID: 35601137 PMCID: PMC9086604 DOI: 10.1136/bmjsem-2021-001242] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/02/2022] [Indexed: 12/12/2022] Open
Abstract
Objective This study uses machine learning (ML) to develop methods for estimating activity type/intensity using smartphones, to evaluate the accuracy of these models for classifying activity, and to evaluate differences in accuracy between three different wear locations. Method Forty-eight participants were recruited to complete a series of activities while carrying Samsung phones in three different locations: backpack, right hand and right pocket. They were asked to sit, lie down, walk and run three Metabolic Equivalent Task (METs), five METs and at seven METs. Raw accelerometer data were collected. We used the R, activity counts package, to calculate activity counts and generated new features based on the raw accelerometer data. We evaluated and compared several ML algorithms; Random Forest (RF), Support Vector Machine, Naïve Bayes, Decision Tree, Linear Discriminant Analysis and k-Nearest Neighbours using the caret package (V.6.0-86). Using the combination of the raw accelerometer data and the computed features leads to high model accuracy. Results Using raw accelerometer data, RF models achieved an accuracy of 92.90% for the right pocket location, 89% for the right hand location and 90.8% for the backpack location. Using activity counts, RF models achieved an accuracy of 51.4% for the right pocket location, 48.5% for the right hand location and 52.1% for the backpack location. Conclusion Our results suggest that using smartphones to measure physical activity is accurate for estimating activity type/intensity and ML methods, such as RF with feature engineering techniques can accurately classify physical activity intensity levels in laboratory settings.
Collapse
Affiliation(s)
- Seyed Javad Khataeipour
- Department of Computer Science, Faculty of Science, Memorial University of Newfoundland, St. John's, Newfoundland, Canada
| | | | - Arastoo Bozorgi
- Department of Computer Science, Faculty of Science, Memorial University of Newfoundland, St. John's, Newfoundland, Canada
| | - Machel Rayner
- School of Human Kinetics and Recreation, Memorial University of Newfoundland, St. John's, Newfoundland, Canada
| | - Fabien A Basset
- School of Human Kinetics and Recreation, Memorial University of Newfoundland, St. John's, Newfoundland, Canada
| | - Daniel Fuller
- Department of Computer Science, Faculty of Science, Memorial University of Newfoundland, St. John's, Newfoundland, Canada
- School of Human Kinetics and Recreation, Memorial University of Newfoundland, St. John's, Newfoundland, Canada
| |
Collapse
|
40
|
Ho L, Jerves-Cobo R, Barthel M, Six J, Bode S, Boeckx P, Goethals P. Greenhouse gas dynamics in an urbanized river system: influence of water quality and land use. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2022; 29:37277-37290. [PMID: 35048344 DOI: 10.1007/s11356-021-18081-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 12/08/2021] [Indexed: 06/14/2023]
Abstract
Rivers act as a natural source of greenhouse gases (GHGs). However, anthropogenic activities can largely alter the chemical composition and microbial communities of rivers, consequently affecting their GHG production. To investigate these impacts, we assessed the accumulation of CO2, CH4, and N2O in an urban river system (Cuenca, Ecuador). High variation of dissolved GHG concentrations was found among river tributaries that mainly depended on water quality and land use. By using Prati and Oregon water quality indices, we observed a clear pattern between water quality and the dissolved GHG concentration: the more polluted the sites were, the higher were their dissolved GHG concentrations. When river water quality deteriorated from acceptable to very heavily polluted, the mean value of pCO2 and dissolved CH4 increased by up to ten times while N2O concentrations boosted by 15 times. Furthermore, surrounding land-use types, i.e., urban, roads, and agriculture, could considerably affect the GHG production in the rivers. Particularly, the average pCO2 and dissolved N2O of the sites close to urban areas were almost four times higher than those of the natural sites while this ratio was 25 times in case of CH4, reflecting the finding that urban areas had the worst water quality with almost 70% of their sites being polluted while this proportion of nature areas was only 12.5%. Lastly, we identified dissolved oxygen, ammonium, and flow characteristics as the main important factors to the GHG production by applying statistical analysis and random forests. These results highlighted the impacts of land-use types on the production of GHGs in rivers contaminated by sewage discharges and surface runoff.
Collapse
Affiliation(s)
- Long Ho
- Department of Animal Sciences, Ghent University, Ghent, Belgium.
| | - Ruben Jerves-Cobo
- Department of Animal Sciences, Ghent University, Ghent, Belgium
- PROMAS, Universidad de Cuenca, Cuenca, Ecuador
- Department of Data Analysis and Mathematical Modelling, BIOMATH, Ghent University, Ghent, Belgium
| | - Matti Barthel
- Department of Environmental System`S Science, ETH Zurich, Zurich, Switzerland
| | - Johan Six
- Department of Environmental System`S Science, ETH Zurich, Zurich, Switzerland
| | - Samuel Bode
- Department of Green Chemistry and Technology, Isotope Bioscience Laboratory - ISOFYS, Ghent University, Ghent, Belgium
| | - Pascal Boeckx
- Department of Green Chemistry and Technology, Isotope Bioscience Laboratory - ISOFYS, Ghent University, Ghent, Belgium
| | - Peter Goethals
- Department of Animal Sciences, Ghent University, Ghent, Belgium
| |
Collapse
|
41
|
Agarwal N, Aabedi AA, Torres-Espin A, Chou A, Wozny TA, Mummaneni PV, Burke JF, Ferguson AR, Kyritsis N, Dhall SS, Weinstein PR, Duong-Fernandez X, Pan J, Singh V, Hemmerle DD, Talbott JF, Whetstone WD, Bresnahan JC, Manley GT, Beattie MS, DiGiorgio AM. Decision tree–based machine learning analysis of intraoperative vasopressor use to optimize neurological improvement in acute spinal cord injury. Neurosurg Focus 2022; 52:E9. [DOI: 10.3171/2022.1.focus21743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 01/20/2022] [Indexed: 11/06/2022]
Abstract
OBJECTIVE
Previous work has shown that maintaining mean arterial pressures (MAPs) between 76 and 104 mm Hg intraoperatively is associated with improved neurological function at discharge in patients with acute spinal cord injury (SCI). However, whether temporary fluctuations in MAPs outside of this range can be tolerated without impairment of recovery is unknown. This retrospective study builds on previous work by implementing machine learning to derive clinically actionable thresholds for intraoperative MAP management guided by neurological outcomes.
METHODS
Seventy-four surgically treated patients were retrospectively analyzed as part of a longitudinal study assessing outcomes following SCI. Each patient underwent intraoperative hemodynamic monitoring with recordings at 5-minute intervals for a cumulative 28,594 minutes, resulting in 5718 unique data points for each parameter. The type of vasopressor used, dose, drug-related complications, average intraoperative MAP, and time spent in an extreme MAP range (< 76 mm Hg or > 104 mm Hg) were collected. Outcomes were evaluated by measuring the change in American Spinal Injury Association Impairment Scale (AIS) grade over the course of acute hospitalization. Features most predictive of an improvement in AIS grade were determined statistically by generating random forests with 10,000 iterations. Recursive partitioning was used to establish clinically intuitive thresholds for the top features.
RESULTS
At discharge, a significant improvement in AIS grade was noted by an average of 0.71 levels (p = 0.002). The hemodynamic parameters most important in predicting improvement were the amount of time intraoperative MAPs were in extreme ranges and the average intraoperative MAP. Patients with average intraoperative MAPs between 80 and 96 mm Hg throughout surgery had improved AIS grades at discharge. All patients with average intraoperative MAP > 96.3 mm Hg had no improvement. A threshold of 93 minutes spent in an extreme MAP range was identified after which the chance of neurological improvement significantly declined. Finally, the use of dopamine as compared to norepinephrine was associated with higher rates of significant cardiovascular complications (50% vs 25%, p < 0.001).
CONCLUSIONS
An average intraoperative MAP value between 80 and 96 mm Hg was associated with improved outcome, corroborating previous results and supporting the clinical verifiability of the model. Additionally, an accumulated time of 93 minutes or longer outside of the MAP range of 76–104 mm Hg is associated with worse neurological function at discharge among patients undergoing emergency surgical intervention for acute SCI.
Collapse
Affiliation(s)
- Nitin Agarwal
- Department of Neurological Surgery, University of California, San Francisco
| | | | - Abel Torres-Espin
- Department of Neurological Surgery, University of California, San Francisco
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California, San Francisco
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco
| | - Austin Chou
- Department of Neurological Surgery, University of California, San Francisco
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California, San Francisco
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco
| | - Thomas A. Wozny
- Department of Neurological Surgery, University of California, San Francisco
| | - Praveen V. Mummaneni
- Department of Neurological Surgery, University of California, San Francisco
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California, San Francisco
| | - John F. Burke
- Department of Neurological Surgery, University of California, San Francisco
| | - Adam R. Ferguson
- Department of Neurological Surgery, University of California, San Francisco
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California, San Francisco
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco
- San Francisco Veterans Affairs Healthcare System, San Francisco; and
| | - Nikos Kyritsis
- Department of Neurological Surgery, University of California, San Francisco
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California, San Francisco
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco
| | - Sanjay S. Dhall
- Department of Neurological Surgery, University of California, San Francisco
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California, San Francisco
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco
| | - Philip R. Weinstein
- Department of Neurological Surgery, University of California, San Francisco
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California, San Francisco
| | - Xuan Duong-Fernandez
- Department of Neurological Surgery, University of California, San Francisco
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California, San Francisco
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco
| | - Jonathan Pan
- Department of Neurological Surgery, University of California, San Francisco
- Department of Anesthesia and Perioperative Care, University of California, San Francisco
| | - Vineeta Singh
- Department of Neurological Surgery, University of California, San Francisco
- Department of Neurology, University of California, San Francisco
| | - Debra D. Hemmerle
- Department of Neurological Surgery, University of California, San Francisco
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California, San Francisco
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco
| | - Jason F. Talbott
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco
- Department of Radiology and Biomedical Imaging, University of California, San Francisco
| | - William D. Whetstone
- Department of Emergency Medicine, University of California, San Francisco, California
| | - Jacqueline C. Bresnahan
- Department of Neurological Surgery, University of California, San Francisco
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California, San Francisco
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco
| | - Geoffrey T. Manley
- Department of Neurological Surgery, University of California, San Francisco
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California, San Francisco
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco
| | - Michael S. Beattie
- Department of Neurological Surgery, University of California, San Francisco
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California, San Francisco
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco
- San Francisco Veterans Affairs Healthcare System, San Francisco; and
| | - Anthony M. DiGiorgio
- Department of Neurological Surgery, University of California, San Francisco
- Weill Institute for Neurosciences, Brain and Spinal Injury Center, University of California, San Francisco
- Zuckerberg San Francisco General Hospital and Trauma Center, San Francisco
| |
Collapse
|
42
|
Lau M, Wigmann C, Kress S, Schikowski T, Schwender H. Evaluation of tree-based statistical learning methods for constructing genetic risk scores. BMC Bioinformatics 2022; 23:97. [PMID: 35313824 PMCID: PMC8935722 DOI: 10.1186/s12859-022-04634-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Accepted: 03/14/2022] [Indexed: 04/11/2024] Open
Abstract
Background Genetic risk scores (GRS) summarize genetic features such as single nucleotide polymorphisms (SNPs) in a single statistic with respect to a given trait. So far, GRS are typically built using generalized linear models or regularized extensions. However, these linear methods are usually not able to incorporate gene-gene interactions or non-linear SNP-response relationships. Tree-based statistical learning methods such as random forests and logic regression may be an alternative to such regularized-regression-based methods and are investigated in this article. Moreover, we consider modifications of random forests and logic regression for the construction of GRS. Results In an extensive simulation study and an application to a real data set from a German cohort study, we show that both tree-based approaches can outperform elastic net when constructing GRS for binary traits. Especially a modification of logic regression called logic bagging could induce comparatively high predictive power as measured by the area under the curve and the statistical power. Even when considering no epistatic interaction effects but only marginal genetic effects, the regularized regression method lead in most cases to inferior results. Conclusions When constructing GRS, we recommend taking random forests and logic bagging into account, in particular, if it can be assumed that possibly unknown epistasis between SNPs is present. To develop the best possible prediction models, extensive joint hyperparameter optimizations should be conducted. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04634-w.
Collapse
Affiliation(s)
- Michael Lau
- Mathematical Institute, Heinrich Heine University, Düsseldorf, Germany. .,IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany.
| | - Claudia Wigmann
- IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
| | - Sara Kress
- IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
| | - Tamara Schikowski
- IUF - Leibniz Research Institute for Environmental Medicine, Düsseldorf, Germany
| | - Holger Schwender
- Mathematical Institute, Heinrich Heine University, Düsseldorf, Germany
| |
Collapse
|
43
|
Yin D, Chen D, Tang Y, Dong H, Li X. Adaptive feature selection with shapley and hypothetical testing: Case study of EEG feature engineering. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.11.063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
44
|
Reproducible neuroimaging features for diagnosis of autism spectrum disorder with machine learning. Sci Rep 2022; 12:3057. [PMID: 35197468 PMCID: PMC8866395 DOI: 10.1038/s41598-022-06459-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Accepted: 01/25/2022] [Indexed: 12/31/2022] Open
Abstract
Autism spectrum disorder (ASD) is the fourth most common neurodevelopmental disorder, with a prevalence of 1 in 160 children. Accurate diagnosis relies on experts, but such individuals are scarce. This has led to increasing interest in the development of machine learning (ML) models that can integrate neuroimaging features from functional and structural MRI (fMRI and sMRI) to help reveal central nervous system alterations characteristic of ASD. We optimized and compared the performance of 12 of the most popular and powerful ML models. Each was separately trained using 15 different combinations of fMRI and sMRI features and optimized with an unbiased model search. Deep learning models predicted ASD with the highest diagnostic accuracy and generalized well to other MRI datasets. Our model achieves state-of-the-art 80% area under the ROC curve (AUROC) in diagnosis on test data from the IMPAC dataset; and 86% and 79% AUROC on the external ABIDE I and ABIDE II datasets (with further improvement to 93% and 90% after supervised domain adaptation). The highest performing models identified reproducible putative biomarkers for accurate ASD diagnosis in accord with known ASD markers as well as novel cerebellar biomarkers. Such reproducibility lends credence to their tremendous potential for defining and using a set of truly generalizable ASD biomarkers that will advance scientific understanding of neuronal changes in ASD.
Collapse
|
45
|
|
46
|
Functional random forests for curve response. Sci Rep 2021; 11:24159. [PMID: 34921167 PMCID: PMC8683425 DOI: 10.1038/s41598-021-02265-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 08/20/2021] [Indexed: 11/22/2022] Open
Abstract
The rapid advancement of functional data in various application fields has increased the demand for advanced statistical approaches that can incorporate complex structures and nonlinear associations. In this article, we propose a novel functional random forests (FunFor) approach to model the functional data response that is densely and regularly measured, as an extension of the landmark work of Breiman, who introduced traditional random forests for a univariate response. The FunFor approach is able to predict curve responses for new observations and selects important variables from a large set of scalar predictors. The FunFor approach inherits the efficiency of the traditional random forest approach in detecting complex relationships, including nonlinear and high-order interactions. Additionally, it is a non-parametric approach without the imposition of parametric and distributional assumptions. Eight simulation settings and one real-data analysis consistently demonstrate the excellent performance of the FunFor approach in various scenarios. In particular, FunFor successfully ranks the true predictors as the most important variables, while achieving the most robust variable sections and the smallest prediction errors when comparing it with three other relevant approaches. Although motivated by a biological leaf shape data analysis, the proposed FunFor approach has great potential to be widely applied in various fields due to its minimal requirement on tuning parameters and its distribution-free and model-free nature. An R package named 'FunFor', implementing the FunFor approach, is available at GitHub.
Collapse
|
47
|
Musolf AM, Holzinger ER, Malley JD, Bailey-Wilson JE. What makes a good prediction? Feature importance and beginning to open the black box of machine learning in genetics. Hum Genet 2021; 141:1515-1528. [PMID: 34862561 PMCID: PMC9360120 DOI: 10.1007/s00439-021-02402-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 11/08/2021] [Indexed: 01/26/2023]
Abstract
Genetic data have become increasingly complex within the past decade, leading researchers to pursue increasingly complex questions, such as those involving epistatic interactions and protein prediction. Traditional methods are ill-suited to answer these questions, but machine learning (ML) techniques offer an alternative solution. ML algorithms are commonly used in genetics to predict or classify subjects, but some methods evaluate which features (variables) are responsible for creating a good prediction; this is called feature importance. This is critical in genetics, as researchers are often interested in which features (e.g., SNP genotype or environmental exposure) are responsible for a good prediction. This allows for the deeper analysis beyond simple prediction, including the determination of risk factors associated with a given phenotype. Feature importance further permits the researcher to peer inside the black box of many ML algorithms to see how they work and which features are critical in informing a good prediction. This review focuses on ML methods that provide feature importance metrics for the analysis of genetic data. Five major categories of ML algorithms: k nearest neighbors, artificial neural networks, deep learning, support vector machines, and random forests are described. The review ends with a discussion of how to choose the best machine for a data set. This review will be particularly useful for genetic researchers looking to use ML methods to answer questions beyond basic prediction and classification.
Collapse
Affiliation(s)
- Anthony M Musolf
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA
| | - Emily R Holzinger
- Target Sciences, Informatics and Predictive Sciences, Bristol Myers Squibb, Cambridge, MA, USA
| | - James D Malley
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA
| | - Joan E Bailey-Wilson
- Statistical Genetics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, 333 Cassell Drive Suite 1200, Baltimore, MD, 21224, USA.
| |
Collapse
|
48
|
Cerina R, Duch R. Polling India via regression and post-stratification of non-probability online samples. PLoS One 2021; 16:e0260092. [PMID: 34843519 PMCID: PMC8629219 DOI: 10.1371/journal.pone.0260092] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 11/02/2021] [Indexed: 11/19/2022] Open
Abstract
Recent technological advances have facilitated the collection of large-scale administrative data and the online surveying of the Indian population. Building on these we propose a strategy for more robust, frequent and transparent projections of the Indian vote during the campaign. We execute a modified MrP model of Indian vote preferences that proposes innovations to each of its three core components: stratification frame, training data, and a learner. For the post-stratification frame we propose a novel Data Integration approach that allows the simultaneous estimation of counts from multiple complementary sources, such as census tables and auxiliary surveys. For the training data we assemble panels of respondents from two unorthodox online populations: Amazon Mechanical Turks workers and Facebook users. And as a modeling tool, we replace the Bayesian multilevel regression learner with Random Forests. Our 2019 pre-election forecasts for the two largest Lok Sahba coalitions were very close to actual outcomes: we predicted 41.8% for the NDA, against an observed value of 45.0% and 30.8% for the UPA against an observed vote share of just under 31.3%. Our uniform-swing seat projection outperforms other pollsters-we had the lowest absolute error of 89 seats (along with a poll from 'Jan Ki Baat'); the lowest error on the NDA-UPA lead (a mere 8 seats), and we are the only pollster that can capture real-time preference shifts due to salient campaign events.
Collapse
Affiliation(s)
- Roberto Cerina
- Data Analytics and Digitalisation, Maastricht University, Maastricht, Netherlands
| | - Raymond Duch
- Nuffield College, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
49
|
|
50
|
Noro F, Marotta A, Bonaccio M, Costanzo S, Santonastaso F, Orlandi S, Tirozzi A, Parisi R, De Curtis A, Persichillo M, Gianfagna F, Di Castelnuovo A, Donati MB, Cerletti C, de Gaetano G, Iacoviello L, Gialluisi A, Izzi B. Fine-grained investigation of the relationship between human nutrition and global DNA methylation patterns. Eur J Nutr 2021; 61:1231-1243. [PMID: 34741648 DOI: 10.1007/s00394-021-02716-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Accepted: 10/18/2021] [Indexed: 10/19/2022]
Abstract
PURPOSE Nutrition is an important, modifiable, environmental factor affecting human health by modulating epigenetic processes, including DNA methylation (5mC). Numerous studies investigated the association of nutrition with global and gene-specific DNA methylation and evidences on animal models highlighted a role in DNA hydroxymethylation (5hmC) regulation. However, a more comprehensive analysis of different layers of nutrition in association with global levels of 5mC and 5hmC is lacking. We investigated the association between global levels of 5mC and 5hmC and human nutrition, through the stratification and analysis of dietary patterns into different nutritional layers: adherence to Mediterranean diet (MD), main food groups, macronutrients and micronutrients intake. METHODS ELISA technique was used to measure global 5mC and 5hmC levels in 1080 subjects from the Moli-sani cohort. Food intake during the 12 months before enrolment was assessed using the semi-quantitative EPIC food frequency questionnaire. Complementary approaches involving both classical statistics and supervised machine learning analyses were used to investigate the associations between global 5mC and 5hmC levels and adherence to Mediterranean diet, main food groups, macronutrients and micronutrients intake. RESULTS We found that global DNA methylation, but not hydroxymethylation, was associated with daily intake of zinc and vitamin B3. Random Forests algorithms predicting 5mC and 5hmC through intakes of food groups, macronutrients and micronutrients revealed a significant contribution of zinc, while vitamin B3 was reported among the most influential features. CONCLUSION We found that nutrition may affect global DNA methylation, suggesting a contribution of micronutrients previously implicated as cofactors in methylation pathways.
Collapse
Affiliation(s)
- Fabrizia Noro
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Annalisa Marotta
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Marialaura Bonaccio
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Simona Costanzo
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Federica Santonastaso
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Sabatino Orlandi
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Alfonsina Tirozzi
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Roberta Parisi
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Amalia De Curtis
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Mariarosaria Persichillo
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Francesco Gianfagna
- Mediterranea Cardiocentro, Naples, Italy.,Department of Medicine and Surgery, EPIMED Research Center, University of Insubria, Varese, Italy
| | | | - Maria Benedetta Donati
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Chiara Cerletti
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Giovanni de Gaetano
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Licia Iacoviello
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy. .,Department of Medicine and Surgery, EPIMED Research Center, University of Insubria, Varese, Italy.
| | - Alessandro Gialluisi
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | - Benedetta Izzi
- Department of Epidemiology and Prevention, IRCCS NEUROMED, Via dell'Elettronica, 86077, Pozzilli, IS, Italy
| | | |
Collapse
|