1
|
Mallioris P, Stefanopoulou M, Luiken REC, Wagenaar JA, Stegeman A, Mughini-Gras L. Diseases associated with antimicrobial use in pig farms and risk factors thereof: A cross-sectional study in the Netherlands. Prev Vet Med 2025; 240:106535. [PMID: 40239452 DOI: 10.1016/j.prevetmed.2025.106535] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2024] [Revised: 03/28/2025] [Accepted: 04/13/2025] [Indexed: 04/18/2025]
Abstract
BACKGROUND Current antimicrobial use (AMU) in Dutch pig farms is driven by herd health status, as only therapeutic AMU is permitted. This study focused on weaners and sows with suckling piglets to examine the diseases associated with i) overall AMU (measured in Defined Daily Dosage Animal per year - DDDA/Y), ii) use/not use of specific antimicrobial classes, iii) total consumption of specific antimicrobial classes (in DDDA/Y), and iv) farm characteristics linked to the occurrence of diseases that require AMU. METHODS Cross-sectional data on AMU, disease aetiologies for group treatments, and farm characteristics were collected from 154 Dutch pig farms, representing the situation in 2019. Associations between disease occurrence as a predictor and AMU (overall and by antimicrobial class) as an outcome were analyzed using multivariable generalized linear regression models. Subsequently, mixed-effects conditional Random Forest analysis was used to identify farm characteristics associated with these diseases. RESULTS Group treatments for musculoskeletal/neurological diseases (MNDs) in suckling piglets, and individual treatments (of unknown aetiology) in sows and suckling piglets, were significantly associated with total AMU there. AMU in weaners was significantly associated with respiratory diseases, MNDs, and individual treatments. Tetracyclines and penicillins were primarily used for respiratory diseases and MNDs in weaners, respectively, and for MNDs in sows and suckling piglets. Having a clear separation between clean and dirty outdoor areas in the farm and using boars from own production for estrus detection were both protective against occurrence of respiratory conditions in weaners, whereas PRRS vaccination in suckling piglets was a risk factor. Streptococcus suis vaccination in sows and fully slatted floors were both risk factors for MNDs in weaners, whereas being an organic farm was protective. Use of disinfecting powders in sows increased MNDs risk in suckling piglets and sows, and a longer lactation period was protective against respiratory diseases and MNDs in weaners. CONCLUSIONS Respiratory diseases and MNDs in weaners appeared as the primary aetiologies for antimicrobial group treatments on Dutch pig farms. Prioritizing farm practices that enhance biosecurity and animal welfare is crucial for controlling these diseases and, consequently, reducing AMU.
Collapse
Affiliation(s)
- Panagiotis Mallioris
- Division of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands.
| | | | - Roosmarijn E C Luiken
- Division of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands
| | - Jaap A Wagenaar
- Division of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands; Wageningen Bioveterinary Research, Lelystad, the Netherlands
| | - Arjan Stegeman
- Division of Farm Animal Health, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands
| | - Lapo Mughini-Gras
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands; National Institute for Public Health and the Environment, Centre for Infectious Disease Control, Bilthoven, the Netherlands
| |
Collapse
|
2
|
Zhukova MA, Chinn LK, Cheek C, Sukmanova AA, Kustova TA, Grigorenko EL. Impact of maternal institutionalization on children's language development: A multidisciplinary study. J Exp Child Psychol 2025; 253:106197. [PMID: 39938244 DOI: 10.1016/j.jecp.2025.106197] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 12/13/2024] [Accepted: 01/03/2025] [Indexed: 02/14/2025]
Abstract
Research has uncovered extensive negative effects of institutional rearing on development, including language deficits. However, less is known about how these effects may be passed down vertically from mothers to children. The current study examined this pathway with respect to language development using behavioral and neural measures. Participants were mother-child dyads (children aged 8-71 months) where the mothers were either previously institutionalized in orphanages (n = 20) or not (n = 34). Mothers qualified for the study if they were 16 to 35 years of age, had a child aged 8 months to 5 years, and were native Russian speakers. We hypothesized that mothers with a history of institutionalization would provide a linguistically impoverished environment, leading to lower language scores in their children and altered neural responses to language violations. Contrary to our hypotheses, maternal history of institutionalization was not significantly associated with child language abilities (expressive or receptive) or the frequency of conversational turns. However, mothers with a history of institutionalization spoke fewer words around their female offspring relative to mothers raised in biological families. Event-related potential (ERP) analyses revealed topography differences in children's P400 response during phonological processing associated with maternal institutionalization history. We were also able to predict with above-chance accuracy children whose mothers had a history of institutionalization using machine learning on ERP measures. These findings suggest the need for targeted interventions to support language development in children of mothers with a history of institutionalization.
Collapse
Affiliation(s)
- Marina A Zhukova
- The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | | | | | | | - Tatiana A Kustova
- Sirius University of Science and Technology, Sirius, Krasnodar region 354340, Russia
| | - Elena L Grigorenko
- University of Houston, Houston, TX 77204, USA; Sirius University of Science and Technology, Sirius, Krasnodar region 354340, Russia; Baylor College of Medicine, Houston, TX 77030, USA; Child Study Center, Yale School of Medicine, New Haven, CT 06519, USA.
| |
Collapse
|
3
|
Tanner J, Igarashi Y, Maekawa K. Speech rate effects on the realisation of multiple acoustic cues to the Japanese stop voicing contrast. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2025; 157:2624-2635. [PMID: 40197544 DOI: 10.1121/10.0036393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2024] [Accepted: 03/25/2025] [Indexed: 04/10/2025]
Abstract
The production of speech at different tempos has consequences for the articulation and perception of linguistic contrasts. For example, in fast speech, segments are often temporally constricted and subject to articulatory undershoot. Although listeners can compensate for rate differences in perceiving phonological contrasts, less is known about how the structure of multiple cues to a contrast is conditioned by changes in speech rate. This study explores how speech rate modulates seven temporal and nontemporal spectral cues to the Japanese stop voicing contrast in spontaneous speech. It is observed that individual cues are subject to variation as a function of speech rate, where all cues undergo reduction or neutralisation in fast speech and the relative importance of each cue changes as a function of speech rate. Ratios between the duration of the stop and surrounding vowels are most informative at slow rates, and the degree of closure voicing is most informative at faster rates. These findings illustrate how the realisation and informativity of multiple cues to a linguistic contrast are conditioned by the articulatory constraints present at different speech rates.
Collapse
Affiliation(s)
- James Tanner
- Department of English Language and Linguistics, University of Glasgow, G12 8QQ Glasgow, United Kingdom
| | - Yosuke Igarashi
- Research Department, National Institute for Japanese Language and Linguistics, Tachikawa, Tokyo 190-8561, Japan
- Graduate University for Advanced Studies, SOKENDAI, Kanagawa 240-0193, Japan
| | - Kikuo Maekawa
- Director-General, National Institute for Japanese Language and Linguistics, Tachikawa, Tokyo 190-8561, Japan
| |
Collapse
|
4
|
Lange TM, Gültas M, Schmitt AO, Heinrich F. optRF: Optimising random forest stability by determining the optimal number of trees. BMC Bioinformatics 2025; 26:95. [PMID: 40165065 PMCID: PMC11959736 DOI: 10.1186/s12859-025-06097-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2024] [Accepted: 02/26/2025] [Indexed: 04/02/2025] Open
Abstract
Machine learning is frequently used to make decisions based on big data. Among these techniques, random forest is particularly prominent. Although random forest is known to have many advantages, one aspect that is often overseen is that it is a non-deterministic method that can produce different models using the same input data. This can have severe consequences on decision-making processes. In this study, we introduce a method to quantify the impact of non-determinism on predictions, variable importance estimates, and decisions based on the predictions or variable importance estimates. Our findings demonstrate that increasing the number of trees in random forests enhances the stability in a non-linear way while computation time increases linearly. Consequently, we conclude that there exists an optimal number of trees for any given data set that maximises the stability without unnecessarily increasing the computation time. Based on these findings, we have developed the R package optRF which models the relationship between the number of trees and the stability of random forest, providing recommendations for the optimal number of trees for any given data set.
Collapse
Affiliation(s)
- Thomas M Lange
- Breeding Informatics Group, Georg-August University, Margarethe Von Wrangell-Weg 7, 37075, Göttingen, Germany.
| | - Mehmet Gültas
- Faculty of Agriculture, South Westphalia University of Applied Sciences, Lübecker Ring 2, 59494, Soest, Germany
- Center for Integrated Breeding Research (Cibreed), Georg-August University, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany
| | - Armin O Schmitt
- Breeding Informatics Group, Georg-August University, Margarethe Von Wrangell-Weg 7, 37075, Göttingen, Germany
- Center for Integrated Breeding Research (Cibreed), Georg-August University, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany
| | - Felix Heinrich
- Breeding Informatics Group, Georg-August University, Margarethe Von Wrangell-Weg 7, 37075, Göttingen, Germany
| |
Collapse
|
5
|
Zhao X, Wu M, Liu H, Wang Y, Zhang Z, Liu Y, Zhang YX. Asymmetric Inter-Hemisphere Communication Contributes to Speech Acquisition of Toddlers with Cochlear Implants. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2025:e2309194. [PMID: 40163364 DOI: 10.1002/advs.202309194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Revised: 12/03/2024] [Indexed: 04/02/2025]
Abstract
How the lateralized language network and its functions emerge with early auditory experiences remains largely unknown. Here, early auditory development is examined using repeated optical imaging for cochlear implanted (CI) toddlers with congenital deafness from onset of restored hearing to around one year of CI hearing experiences. Machine learning models are constructed to resolve how functional organization of the bilateral language network and its sound processing support the CI children's post-implantation development of auditory and verbal communication skills. Behavioral improvement is predictable by cortical processing as well as by network organization changes, with the highest classification accuracy of 81.57%. For cortical processing, behavioral prediction is better for the left than the right hemisphere and for speech than non-speech processing. For network organization, the best prediction is obtained for resting state, with greater contribution from inter-hemisphere connections between non-homologous regions than from within-hemisphere connections. Most interestingly, systematic connectivity-to-activity models reveal that speech processing of the left language network is developmentally supported largely by global network organization, particularly asymmetric inter-hemisphere communication, rather than functional segregation of local network. These findings collectively confirm the importance of asymmetric inter-hemisphere communication in formation of the lateralized language network and its functional development with early auditory experiences.
Collapse
Affiliation(s)
- Xue Zhao
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, 100875, China
| | - Meiyun Wu
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, 100875, China
| | - Haotian Liu
- Department of Otolaryngology Head and Neck Surgery, West China Hospital of Sichuan University, Chengdu, 610041, China
| | - Yuyang Wang
- Department of Otolaryngology Head and Neck Surgery, Hunan Provincial People's Hospital (First Affiliated Hospital of Hunan Normal University), Changsha, 410005, China
| | - Zhikai Zhang
- Department of Otolaryngology Head and Neck Surgery, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, 100025, China
| | - Yuhe Liu
- Department of Otolaryngology Head and Neck Surgery, Beijing Friendship Hospital, Capital Medical University, Beijing, 100050, China
| | - Yu-Xuan Zhang
- State Key Laboratory of Cognitive Neuroscience and Learning, Beijing Normal University, Beijing, 100875, China
| |
Collapse
|
6
|
Ivanov MV, Kopeykina AS, Kazakova EM, Tarasova IA, Sun Z, Postoenko VI, Yang J, Gorshkov MV. Modified Decision Tree with Custom Splitting Logic Improves Generalization across Multiple Brains' Proteomic Data Sets of Alzheimer's Disease. J Proteome Res 2025; 24:1053-1066. [PMID: 39984290 DOI: 10.1021/acs.jproteome.4c00677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2025]
Abstract
Many factors negatively affect a generalization of the findings in discovery proteomics. They include differentiation between patient cohorts, a variety of experimental conditions, etc. We presented a machine-learning-based workflow for proteomics data analysis, aiming at improving generalizability across multiple data sets. In particular, we customized the decision tree model by introducing a new parameter, min_groups_leaf, which regulates the presence of the samples from each data set inside the model's leaves. Further, we analyzed a trend for the feature importance's curve as a function of the novel parameter for feature selection to a list of proteins with significantly improved generalization. The developed workflow was tested using five proteomic data sets obtained for post-mortem human brain samples of Alzheimer's disease. The data sets consisted of 535 LC-MS/MS acquisition files. The results were obtained for two different pipelines of data processing: (1) MS1-only processing based on DirectMS1 search engine and (2) a standard MS/MS-based one. Using the developed workflow, we found seven proteins with expression patterns that were unique for asymptomatic Alzheimer patients. Two of them, Serotransferrin TRFE and DNA repair nuclease APEX1, may be potentially important for explaining the lack of dementia in patients with the presence of neuritic plaques and neurofibrillary tangles.
Collapse
Affiliation(s)
- Mark V Ivanov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Anna S Kopeykina
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Elizaveta M Kazakova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Irina A Tarasova
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Zhao Sun
- Clinical Systems Biology Key Laboratory, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
- Department of Gastrointestinal Surgery, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
- Institute of Infection and Immunity, Henan Academy of Innovations in Medical Science, Zhengzhou 450052, China
| | - Valeriy I Postoenko
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| | - Jinghua Yang
- Clinical Systems Biology Key Laboratory, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
- Institute of Infection and Immunity, Henan Academy of Innovations in Medical Science, Zhengzhou 450052, China
| | - Mikhail V Gorshkov
- V. L. Talrose Institute for Energy Problems of Chemical Physics, N. N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences, Moscow 119334, Russia
| |
Collapse
|
7
|
Baron C, Mehanna P, Daneault C, Hausermann L, Busseuil D, Tardif JC, Dupuis J, Des Rosiers C, Ruiz M, Hussin JG. Insights into heart failure metabolite markers through explainable machine learning. Comput Struct Biotechnol J 2025; 27:1012-1022. [PMID: 40160858 PMCID: PMC11953987 DOI: 10.1016/j.csbj.2025.02.041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 02/25/2025] [Accepted: 02/27/2025] [Indexed: 04/02/2025] Open
Abstract
Understanding molecular traits through metabolomics offers an avenue to tailor cardiovascular prevention, diagnosis and treatment strategies more effectively. This study focuses on the application of machine learning (ML) and explainable artificial intelligence (XAI) algorithms to detect discriminant molecular signatures in heart failure (HF). We aim to uncover metabolites with significant predictive value by analyzing targeted metabolomics data through ML and XAI algorithms. After quality control, we analyzed 55 metabolites from 124 plasma samples, including 53 HF patients and 71 controls, comparing Ridge Logistic Regression, Support Vector Machine and eXtreme Gradient Boosting models. All achieved high accuracy in predicting group labels: 84.0% [95% CI: 75.3 - 92.7], 85.73 [95% CI: 78.6 - 92.9], and 84.8% [95% CI: 76.1 - 93.5], respectively. Permutation-based variable importance and Local Interpretable Model-agnostic Explanations (LIME) were used for group-level and individual-level explainability, respectively, complemented by H-Friedman statistics for variable interactions, yielding reliable, explainable insights of the ML models. Metabolites well-known for their association with HF, such as glucose and cholesterol, and more recently described, the C18:1 carnitine, were reaffirmed in our analysis. The novel discovery of lignoceric acid (C24:0 fatty acid) as a critical discriminator, was confirmed in a replication cohort, underscoring its potential as a metabolite marker. Furthermore, our study highlights the utility of 2-way variable interaction analysis in unveiling a network of metabolite interactions essential for accurate disease prediction. The results demonstrate our approach's efficacy in identifying key metabolites and their interactions, illustrating the power of ML and XAI in advancing personalized healthcare solutions.
Collapse
Affiliation(s)
- Cantin Baron
- Département de Biochimie et de Médecine Moléculaire, Université de Montréal, Montréal, Quebec, Canada
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
- Mila - Quebec AI Institute, Université de Montréal, Montréal, Quebec, Canada
| | - Pamela Mehanna
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
| | | | | | - David Busseuil
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
| | - Jean-Claude Tardif
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
- Département de médecine, Université de Montréal, Montréal, Quebec, Canada
| | - Jocelyn Dupuis
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
- Département de médecine, Université de Montréal, Montréal, Quebec, Canada
| | - Christine Des Rosiers
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
- Département de Nutrition, Université de Montréal, Montréal, Quebec, Canada
| | - Matthieu Ruiz
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
- Département de Nutrition, Université de Montréal, Montréal, Quebec, Canada
| | - Julie G. Hussin
- Département de Biochimie et de Médecine Moléculaire, Université de Montréal, Montréal, Quebec, Canada
- Montreal Heart Institute, Research Center, Montréal, Quebec, Canada
- Mila - Quebec AI Institute, Université de Montréal, Montréal, Quebec, Canada
- Département de médecine, Université de Montréal, Montréal, Quebec, Canada
| |
Collapse
|
8
|
Wade BSC, Pindale R, Luccarelli J, Li S, Meisner RC, Seiner SJ, Camprodon JA, Henry ME. Prediction of individual treatment allocation between electroconvulsive therapy or ketamine using the Personalized Advantage Index. NPJ Digit Med 2025; 8:127. [PMID: 40016503 PMCID: PMC11868618 DOI: 10.1038/s41746-025-01523-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 02/17/2025] [Indexed: 03/01/2025] Open
Abstract
Electroconvulsive therapy (ECT) and ketamine are effective treatments for depression; however, evidence-based guidelines are needed to inform individual treatment selection. We adapted the Personalized Advantage Index (PAI) using machine learning to predict optimal treatment assignment to ECT or ketamine using EHR data on 2506 ECT and 196 ketamine patients. Depressive symptoms were evaluated using the Quick Inventory of Depressive Symptomatology (QIDS) before and during acute treatment. Propensity score matching across treatments was used to address confounding by indication, yielding a sample of 392 patients (n = 196 per treatment). Models predicted differential minimum QIDS scores (min-QIDS) over acute treatment using pretreatment EHR measures and SHAP values identified prescriptive predictors. Patients with large PAI scores who received a predicted optimal had significantly lower min-QIDS compared to the non-optimal treatment group (mean difference = 1.19 [95% CI: 0.32, ∞], t = 2.25, q < 0.05, d = 0.26). Our model identified candidate pretreatment factors to provide actionable, effective antidepressant treatment selection guidelines.
Collapse
Affiliation(s)
- Benjamin S C Wade
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
| | - Ryan Pindale
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - James Luccarelli
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Shuang Li
- Department of Psychiatry, McLean Hospital, Belmont, MA, USA
| | - Robert C Meisner
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Department of Psychiatry, McLean Hospital, Belmont, MA, USA
| | | | - Joan A Camprodon
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Michael E Henry
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
9
|
Sun P, Wang X, Wang S, Jia X, Feng S, Chen J, Fang Y. Bipolar disorder: Construction and analysis of a joint diagnostic model using random forest and feedforward neural networks. IBRO Neurosci Rep 2024; 17:145-153. [PMID: 39206162 PMCID: PMC11350441 DOI: 10.1016/j.ibneur.2024.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 07/22/2024] [Accepted: 07/30/2024] [Indexed: 09/04/2024] Open
Abstract
Background To construct a diagnostic model for Bipolar Disorder (BD) depressive phase using peripheral tissue RNA data from patients and combining Random Forest with Feedforward Neural Network methods. Methods Datasets GSE23848, GSE39653, and GSE69486 were selected, and differential gene expression analysis was conducted using the limma package in R. Key genes from the differentially expressed genes were identified using the Random Forest method. These key genes' expression levels in each sample were used to train a Feedforward Neural Network model. Techniques like L1 regularization, early stopping, and dropout layers were employed to prevent model overfitting. Model performance was then validated, followed by GO, KEGG, and protein-protein interaction network analyses. Results The final model was a Feedforward Neural Network with two hidden layers and two dropout layers, comprising 2345 trainable parameters. Model performance on the validation set, assessed through 1000 bootstrap resampling iterations, demonstrated a specificity of 0.769 (95 % CI 0.571-1.000), sensitivity of 0.818 (95 % CI 0.533-1.000), AUC value of 0.832 (95 % CI 0.642-0.979), and accuracy of 0.792 (95 % CI 0.625-0.958). Enrichment analysis of key genes indicated no significant enrichment in any known pathways. Conclusion Key genes with biological significance were identified based on the decrease in Gini coefficient within the Random Forest model. The combined use of Random Forest and Feedforward Neural Network to establish a diagnostic model showed good classification performance in Bipolar Disorder.
Collapse
Affiliation(s)
- Ping Sun
- Qingdao Mental Health Center, Shandong 266034, China
- Clinical Research Center, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200030, China
| | - Xiangwen Wang
- Qingdao Mental Health Center, Shandong 266034, China
- School of Mental Health, Research Institute of Mental Health,Jining Medical University, Shandong 272002, China
| | - Shenghai Wang
- Qingdao Mental Health Center, Shandong 266034, China
| | - Xueyu Jia
- Department of Medicine,Qingdao University, Shandong 266000, China
| | - Shunkang Feng
- Qingdao Mental Health Center, Shandong 266034, China
| | - Jun Chen
- Clinical Research Center, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200030, China
- Department of Psychiatry & Affective Disorders Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai 201108, China
| | - Yiru Fang
- Clinical Research Center, Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, Shanghai 200030, China
- Department of Psychiatry & Affective Disorders Center, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200025, China
- Shanghai Key Laboratory of Psychotic Disorders, Shanghai 201108, China
- State Key Laboratory of Neuroscience, Shanghai Institue for Biological Sciences, CAS, Shanghai 200031, China
| |
Collapse
|
10
|
Seki T, Takiguchi T, Akagi Y, Ito H, Kubota K, Miyake K, Okada M, Kawazoe Y. Iterative random forest-based identification of a novel population with high risk of complications post non-cardiac surgery. Sci Rep 2024; 14:26741. [PMID: 39500963 PMCID: PMC11538396 DOI: 10.1038/s41598-024-78482-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Accepted: 10/31/2024] [Indexed: 11/08/2024] Open
Abstract
Assessing the risk of postoperative cardiovascular events before performing non-cardiac surgery is clinically important. The current risk score systems for preoperative evaluation may not adequately represent a small subset of high-risk populations. Accordingly, this study aimed at applying iterative random forest to analyze combinations of factors that could potentially be clinically valuable in identifying these high-risk populations. To this end, we used the Japan Medical Data Center database, which includes claims data from Japan between January 2005 and April 2021, and employed iterative random forests to extract factor combinations that influence outcomes. The analysis demonstrated that a combination of a prior history of stroke and extremely low LDL-C levels was associated with a high non-cardiac postoperative risk. The incidence of major adverse cardiovascular events in the population characterized by the incidence of previous stroke and extremely low LDL-C levels was 15.43 events per 100 person-30 days [95% confidence interval, 6.66-30.41] in the test data. At this stage, the results only show correlation rather than causation; however, these findings may offer valuable insights for preoperative risk assessment in non-cardiac surgery.
Collapse
Affiliation(s)
- Tomohisa Seki
- Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan.
| | - Toru Takiguchi
- Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan
| | - Yu Akagi
- Department of Biomedical Informatics, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Hiromasa Ito
- Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan
| | - Kazumi Kubota
- Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan
| | - Kana Miyake
- Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan
| | - Masafumi Okada
- Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan
| | - Yoshimasa Kawazoe
- Department of Healthcare Information Management, The University of Tokyo Hospital, Tokyo, Japan
- Artificial Intelligence and Digital Twin in Healthcare, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
11
|
Wang Z, Whipp AM, Heinonen-Guzejev M, Foraster M, Júlvez J, Kaprio J. The association between urban land use and depressive symptoms in young adulthood: a FinnTwin12 cohort study. JOURNAL OF EXPOSURE SCIENCE & ENVIRONMENTAL EPIDEMIOLOGY 2024; 34:770-779. [PMID: 38081942 PMCID: PMC11446816 DOI: 10.1038/s41370-023-00619-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 11/20/2023] [Accepted: 11/22/2023] [Indexed: 10/04/2024]
Abstract
BACKGROUND Depressive symptoms lead to a serious public health burden and are considerably affected by the environment. Land use, describing the urban living environment, influences mental health, but complex relationship assessment is rare. OBJECTIVE We aimed to examine the complicated association between urban land use and depressive symptoms among young adults with differential land use environments, by applying multiple models. METHODS We included 1804 individual twins from the FinnTwin12 cohort, living in urban areas in 2012. There were eight types of land use exposures in three buffer radii. The depressive symptoms were assessed through the General Behavior Inventory (GBI) in young adulthood (mean age: 24.1). First, K-means clustering was performed to distinguish participants with differential land use environments. Then, linear elastic net penalized regression and eXtreme Gradient Boosting (XGBoost) were used to reduce dimensions or prioritize for importance and examine the linear and nonlinear relationships. RESULTS Two clusters were identified: one is more typical of city centers and another of suburban areas. A heterogeneous pattern in results was detected from the linear elastic net penalized regression model among the overall sample and the two separated clusters. Agricultural residential land use in a 100 m buffer contributed to GBI most (coefficient: 0.097) in the "suburban" cluster among 11 selected exposures after adjustment with demographic covariates. In the "city center" cluster, none of the land use exposures was associated with GBI, even after further adjustment with social indicators. From the XGBoost models, we observed that ranks of the importance of land use exposures on GBI and their nonlinear relationships are also heterogeneous in the two clusters. IMPACT This study examined the complex relationship between urban land use and depressive symptoms among young adults in Finland. Based on the FinnTwin12 cohort, two distinct clusters of participants were identified with different urban land use environments at first. We then employed two pluralistic models, elastic net penalized regression and XGBoost, and revealed both linear and nonlinear relationships between urban land use and depressive symptoms, which also varied in the two clusters. The findings suggest that analyses, involving land use and the broader environmental profile, should consider aspects such as population heterogeneity and linearity for comprehensive assessment in the future.
Collapse
Affiliation(s)
- Zhiyang Wang
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
| | - Alyce M Whipp
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland
- Department of Public Health, University of Helsinki, Helsinki, Finland
| | | | - Maria Foraster
- PHAGEX Research Group, Blanquerna School of Health Science, Universitat Ramon Llull (URL), Barcelona, Spain
- ISGlobal-Instituto de Salud Global de Barcelona Campus MAR, Parc de Recerca Biomèdica de Barcelona (PRBB), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- CIBER Epidemiología y Salud Pública (CIBEREsp), Madrid, Spain
| | - Jordi Júlvez
- ISGlobal-Instituto de Salud Global de Barcelona Campus MAR, Parc de Recerca Biomèdica de Barcelona (PRBB), Barcelona, Spain
- Clinical and Epidemiological Neuroscience (NeuroÈpia), Institut d'Investigació Sanitària Pere Virgili (IISPV), Reus, Spain
| | - Jaakko Kaprio
- Institute for Molecular Medicine Finland, Helsinki Institute of Life Science, University of Helsinki, Helsinki, Finland.
- Department of Public Health, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
12
|
Mallioris P, Luiken REC, Tobias T, Vonk J, Wagenaar JA, Stegeman A, Mughini-Gras L. Risk factors for antimicrobial use in Dutch pig farms: A cross-sectional study. Res Vet Sci 2024; 174:105307. [PMID: 38781817 DOI: 10.1016/j.rvsc.2024.105307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 05/04/2024] [Accepted: 05/13/2024] [Indexed: 05/25/2024]
Abstract
BACKGROUND Antimicrobial use (AMU) has decreased significantly in Dutch pig farms since 2009. However, this decrease has stagnated recently, with relatively high AMU levels persisting mainly among weaners. The aim of this study was to identify farm-level characteristics associated with: i) total AMU and ii) use of specific antimicrobial classes. METHODS In 2020, cross-sectional data from 154 Dutch pig farms were collected, including information on AMU and farm characteristics. A mixed-effects conditional Random Forest analysis was applied to select the subset of features that was best associated with AMU. RESULTS The main risk factors for total AMU in weaners were vaccination for PRRS in sucklings, being a conventional farm (vs. not), high within-farm density, and early weaning. The main protective factors for total AMU in sows/sucklings were E. coli vaccination in sows and having boars for estrus detection from own production. Regarding antimicrobial class-specific outcomes, several risk factors overlapped for weaners and sows/sucklings, such as farmer's non-tertiary education, not having free-sow systems during lactation, and conventional farming. An additional risk factor for weaners was having fully slatted floors. For fatteners, the main risk factor for total AMU was PRRS vaccination in sucklings. CONCLUSIONS Several factors found here to be associated with AMU. Some were known but others were novel, such as farmer's tertiary education, low pig aggression and free-sow systems which were all associated with lower AMU. These factors provide targets for developing tailor-made interventions, as well as an evidence-based selection of features for further causal assessment and mediation analysis.
Collapse
Affiliation(s)
- Panagiotis Mallioris
- Division of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands.
| | - Roosmarijn E C Luiken
- Division of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands
| | - Tijs Tobias
- Department of Population Health Sciences, Farm Animal Health unit, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands; Swine Health Department, Royal GD, Deventer, the Netherlands
| | - John Vonk
- John Vonk DVM, BSc Agriculture, De Varkenspraktijk, Obrechtstraat 2, 5344 AT, Oss, the Netherlands
| | - Jaap A Wagenaar
- Division of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands; Wageningen Bioveterinary Research, Lelystad, the Netherlands
| | - Arjan Stegeman
- Department of Population Health Sciences, Farm Animal Health unit, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands
| | - Lapo Mughini-Gras
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands; National Institute for Public Health and the Environment, Centre for Infectious Disease Control, Bilthoven, the Netherlands
| |
Collapse
|
13
|
Cheek CL, Lindner P, Grigorenko EL. Statistical and Machine Learning Analysis in Brain-Imaging Genetics: A Review of Methods. Behav Genet 2024; 54:233-251. [PMID: 38336922 DOI: 10.1007/s10519-024-10177-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 01/24/2024] [Indexed: 02/12/2024]
Abstract
Brain-imaging-genetic analysis is an emerging field of research that aims at aggregating data from neuroimaging modalities, which characterize brain structure or function, and genetic data, which capture the structure and function of the genome, to explain or predict normal (or abnormal) brain performance. Brain-imaging-genetic studies offer great potential for understanding complex brain-related diseases/disorders of genetic etiology. Still, a combined brain-wide genome-wide analysis is difficult to perform as typical datasets fuse multiple modalities, each with high dimensionality, unique correlational landscapes, and often low statistical signal-to-noise ratios. In this review, we outline the progress in brain-imaging-genetic methodologies starting from early massive univariate to current deep learning approaches, highlighting each approach's strengths and weaknesses and elongating it with the field's development. We conclude by discussing selected remaining challenges and prospects for the field.
Collapse
Affiliation(s)
- Connor L Cheek
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA.
- Department of Physics, University of Houston, Houston, TX, USA.
| | - Peggy Lindner
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA
- Department of Information Science Technology, University of Houston, Houston, TX, USA
| | - Elena L Grigorenko
- Texas Institute for Evaluation, Measurement, and Statistics, University of Houston, Houston, TX, USA
- Department of Psychology, University of Houston, Houston, TX, USA
- Baylor College of Medicine, Houston, TX, USA
- Sirius University of Science and Technology, Sochi, Russia
| |
Collapse
|
14
|
Bramer LM, Dixon HM, Rohlman D, Scott RP, Miller RL, Kincl L, Herbstman JB, Waters KM, Anderson KA. PM 2.5 Is Insufficient to Explain Personal PAH Exposure. GEOHEALTH 2024; 8:e2023GH000937. [PMID: 38344245 PMCID: PMC10858395 DOI: 10.1029/2023gh000937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 01/19/2024] [Accepted: 01/22/2024] [Indexed: 10/28/2024]
Abstract
To understand how chemical exposure can impact health, researchers need tools that capture the complexities of personal chemical exposure. In practice, fine particulate matter (PM2.5) air quality index (AQI) data from outdoor stationary monitors and Hazard Mapping System (HMS) smoke density data from satellites are often used as proxies for personal chemical exposure, but do not capture total chemical exposure. Silicone wristbands can quantify more individualized exposure data than stationary air monitors or smoke satellites. However, it is not understood how these proxy measurements compare to chemical data measured from wristbands. In this study, participants wore daily wristbands, carried a phone that recorded locations, and answered daily questionnaires for a 7-day period in multiple seasons. We gathered publicly available daily PM2.5 AQI data and HMS data. We analyzed wristbands for 94 organic chemicals, including 53 polycyclic aromatic hydrocarbons. Wristband chemical detections and concentrations, behavioral variables (e.g., time spent indoors), and environmental conditions (e.g., PM2.5 AQI) significantly differed between seasons. Machine learning models were fit to predict personal chemical exposure using PM2.5 AQI only, HMS only, and a multivariate feature set including PM2.5 AQI, HMS, and other environmental and behavioral information. On average, the multivariate models increased predictive accuracy by approximately 70% compared to either the AQI model or the HMS model for all chemicals modeled. This study provides evidence that PM2.5 AQI data alone or HMS data alone is insufficient to explain personal chemical exposures. Our results identify additional key predictors of personal chemical exposure.
Collapse
Affiliation(s)
- Lisa M. Bramer
- Biological Sciences DivisionPacific Northwest National LaboratoryRichlandWAUSA
| | - Holly M. Dixon
- Department of Environmental and Molecular ToxicologyFood Safety and Environmental Stewardship ProgramOregon State UniversityCorvallisORUSA
| | - Diana Rohlman
- College of HealthOregon State UniversityCorvallisORUSA
| | - Richard P. Scott
- Department of Environmental and Molecular ToxicologyFood Safety and Environmental Stewardship ProgramOregon State UniversityCorvallisORUSA
| | - Rachel L. Miller
- Division of Clinical ImmunologyIcahn School of Medicine at Mount SinaiNew York CityNYUSA
| | - Laurel Kincl
- College of HealthOregon State UniversityCorvallisORUSA
| | - Julie B. Herbstman
- Department of Environmental Health SciencesColumbia Center for Children's Environmental HealthMailman School of Public HealthColumbia UniversityNew York CityNYUSA
| | - Katrina M. Waters
- Biological Sciences DivisionPacific Northwest National LaboratoryRichlandWAUSA
- Department of Environmental and Molecular ToxicologyFood Safety and Environmental Stewardship ProgramOregon State UniversityCorvallisORUSA
| | - Kim A. Anderson
- Department of Environmental and Molecular ToxicologyFood Safety and Environmental Stewardship ProgramOregon State UniversityCorvallisORUSA
| |
Collapse
|
15
|
Wade B, Pindale R, Camprodon J, Luccarelli J, Li S, Meisner R, Seiner S, Henry M. Individual Prediction of Optimal Treatment Allocation Between Electroconvulsive Therapy or Ketamine using the Personalized Advantage Index. RESEARCH SQUARE 2023:rs.3.rs-3682009. [PMID: 38077094 PMCID: PMC10705694 DOI: 10.21203/rs.3.rs-3682009/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
Introduction Electroconvulsive therapy (ECT) and ketamine are two effective treatments for depression with similar efficacy; however, individual patient outcomes may be improved by models that predict optimal treatment assignment. Here, we adapt the Personalized Advantage Index (PAI) algorithm using machine learning to predict optimal treatment assignment between ECT and ketamine using medical record data from a large, naturalistic patient cohort. We hypothesized that patients who received a treatment predicted to be optimal would have significantly better outcomes following treatment compared to those who received a non-optimal treatment. Methods Data on 2526 ECT and 235 mixed IV ketamine and esketamine patients from McLean Hospital was aggregated. Depressive symptoms were measured using the Quick Inventory of Depressive Symptomatology (QIDS) before and during acute treatment. Patients were matched between treatments on pretreatment QIDS, age, inpatient status, and psychotic symptoms using a 1:1 ratio yielding a sample of 470 patients (n=235 per treatment). Random forest models were trained and predicted differential patientwise minimum QIDS scores achieved during acute treatment (min-QIDS) scores for ECT and ketamine using pretreatment patient measures. Analysis of Shapley Additive exPlanations (SHAP) values identified predictors of differential outcomes between treatments. Results Twenty-seven percent of patients with the largest PAI scores who received a treatment predicted optimal had significantly lower min-QIDS scores compared to those who received a non-optimal treatment (mean difference=1.6, t=2.38, q<0.05, Cohen's D=0.36). Analysis of SHAP values identified prescriptive pretreatment measures. Conclusions Patients assigned to a treatment predicted to be optimal had significantly better treatment outcomes. Our model identified pretreatment patient factors captured in medical records that can provide interpretable and actionable guidelines treatment selection.
Collapse
Affiliation(s)
- Benjamin Wade
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Ryan Pindale
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Joan Camprodon
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - James Luccarelli
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Shuang Li
- Department of Psychiatry, McLean Hospital, Belmont, MA, USA
| | - Robert Meisner
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Department of Psychiatry, McLean Hospital, Belmont, MA, USA
| | - Stephen Seiner
- Department of Psychiatry, McLean Hospital, Belmont, MA, USA
| | - Michael Henry
- Department of Psychiatry, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
16
|
Heinrich F, Lange TM, Kircher M, Ramzan F, Schmitt AO, Gültas M. Exploring the potential of incremental feature selection to improve genomic prediction accuracy. Genet Sel Evol 2023; 55:78. [PMID: 37946104 PMCID: PMC10634161 DOI: 10.1186/s12711-023-00853-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 11/02/2023] [Indexed: 11/12/2023] Open
Abstract
BACKGROUND The ever-increasing availability of high-density genomic markers in the form of single nucleotide polymorphisms (SNPs) enables genomic prediction, i.e. the inference of phenotypes based solely on genomic data, in the field of animal and plant breeding, where it has become an important tool. However, given the limited number of individuals, the abundance of variables (SNPs) can reduce the accuracy of prediction models due to overfitting or irrelevant SNPs. Feature selection can help to reduce the number of irrelevant SNPs and increase the model performance. In this study, we investigated an incremental feature selection approach based on ranking the SNPs according to the results of a genome-wide association study that we combined with random forest as a prediction model, and we applied it on several animal and plant datasets. RESULTS Applying our approach to different datasets yielded a wide range of outcomes, i.e. from a substantial increase in prediction accuracy in a few cases to minor improvements when only a fraction of the available SNPs were used. Compared with models using all available SNPs, our approach was able to achieve comparable performances with a considerably reduced number of SNPs in several cases. Our approach showcased state-of-the-art efficiency and performance while having a faster computation time. CONCLUSIONS The results of our study suggest that our incremental feature selection approach has the potential to improve prediction accuracy substantially. However, this gain seems to depend on the genomic data used. Even for datasets where the number of markers is smaller than the number of individuals, feature selection may still increase the performance of the genomic prediction. Our approach is implemented in R and is available at https://github.com/FelixHeinrich/GP_with_IFS/ .
Collapse
Affiliation(s)
- Felix Heinrich
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075, Göttingen, Germany.
| | - Thomas Martin Lange
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075, Göttingen, Germany
| | - Magdalena Kircher
- Institute for Animal Breeding and Genetics, University of Veterinary Medicine Hannover, Bünteweg 17p, 30559, Hannover, Germany
| | - Faisal Ramzan
- Institute of Animal and Dairy Sciences, University of Agriculture Faisalabad, Jail Road, 38000, Faisalabad, Pakistan
| | - Armin Otto Schmitt
- Breeding Informatics Group, Department of Animal Sciences, Georg-August University, Margarethe von Wrangell-Weg 7, 37075, Göttingen, Germany
- Center for Integrated Breeding Research (CiBreed), Georg-August University, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany
| | - Mehmet Gültas
- Center for Integrated Breeding Research (CiBreed), Georg-August University, Albrecht-Thaer-Weg 3, 37075, Göttingen, Germany.
- Faculty of Agriculture, South Westphalia University of Applied Sciences, 59494, Soest, Germany.
| |
Collapse
|
17
|
Xu W, Sampson M. Prenatal and Childbirth Risk Factors of Postpartum Pain and Depression: A Machine Learning Approach. Matern Child Health J 2023; 27:286-296. [PMID: 36526882 DOI: 10.1007/s10995-022-03532-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2022] [Indexed: 12/23/2022]
Abstract
OBJECTIVES About 74.91% of U.S. mothers experience postpartum pain at 6 to 10 weeks postpartum, and one in seven U.S. mothers suffer from postpartum depression. We used machine learning to explore physical, psychological, and social factors during pregnancy and childbirth and identify the most important predictors of postpartum pain and depression. METHODS Data were from the Listening To Mothers III survey (2012), a national representative sample of postpartum mothers. We randomly split the dataset into a training set (N = 1467) and a test set (N = 723). The final models included 34 risk factors identified from previous literature. Postpartum pain was measured as "to what extent the pain interferes with mothers' daily life". PHQ2 scores measured depression. We used the random forest model, an aggregate of many regression trees, to accommodate potential nonlinear/interaction effects. RESULTS In the test data set, our models explained 15.8% of the variance in pain and 27.1% of the variance in depression. The model's strongest predictors for postpartum pain were Cesarean delivery, holding back while communicating with providers, non-use of pain relief medications, and perceived discrimination. For depression scores, the model's strongest predictors included needing help for depression during pregnancy, perceived discrimination, holding back, gestational diabetes, and pain. CONCLUSIONS FOR PRACTICE Mental and physical health are intertwined and should be considered integratively in the perinatal period. Besides, practitioners should also be aware of the importance of patient-provider-relationship, which both independently and interact with other risk factors to predict postpartum health.
Collapse
Affiliation(s)
- Wen Xu
- Graduate College of Social Work, University of Houston, Houston, USA.
| | - McClain Sampson
- Graduate College of Social Work, University of Houston, Houston, USA
| |
Collapse
|
18
|
Scornet E. Trees, forests, and impurity-based variable importance in regression. ANNALES DE L'INSTITUT HENRI POINCARÉ, PROBABILITÉS ET STATISTIQUES 2023. [DOI: 10.1214/21-aihp1240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Erwan Scornet
- Centre de Mathématiques Appliquées, Ecole Polytechnique, CNRS, Institut Polytechnique de Paris, Palaiseau, France
| |
Collapse
|
19
|
Hapfelmeier A, Hornung R, Haller B. Efficient permutation testing of variable importance measures by the example of random forests. Comput Stat Data Anal 2023. [DOI: 10.1016/j.csda.2022.107689] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
20
|
Zhang B, Wang H. Exploring the advantages of the maximum entropy model in calibrating cellular automata for urban growth simulation: a comparative study of four methods. GISCIENCE & REMOTE SENSING 2022; 59:71-95. [DOI: 10.1080/15481603.2021.2016240] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 11/25/2021] [Indexed: 09/01/2023]
Affiliation(s)
- Bin Zhang
- School of Resource and Environmental Sciences, Wuhan University, Wuhan, China
| | - Haijun Wang
- School of Resource and Environmental Sciences, Wuhan University, Wuhan, China
- Key Laboratory of Geographic Information System of MOE, Wuhan University, Wuhan, China
| |
Collapse
|
21
|
Mallioris P, Teunis G, Lagerweij G, Joosten P, Dewulf J, Wagenaar JA, Stegeman A, Mughini-Gras L. Biosecurity and antimicrobial use in broiler farms across nine European countries: toward identifying farm-specific options for reducing antimicrobial usage. Epidemiol Infect 2022; 151:e13. [PMID: 36573356 PMCID: PMC9990406 DOI: 10.1017/s0950268822001960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 12/05/2022] [Accepted: 12/18/2022] [Indexed: 12/28/2022] Open
Abstract
Broiler chickens are among the main livestock sectors worldwide. With individual treatments being inapplicable, contrary to many other animal species, the need for antimicrobial use (AMU) is relatively high. AMU in animals is known to drive the emergence and spread of antimicrobial resistance (AMR). High farm biosecurity is a cornerstone for animal health and welfare, as well as food safety, as it protects animals from the introduction and spread of pathogens and therefore the need for AMU. The goal of this study was to identify the main biosecurity practices associated with AMU in broiler farms and to develop a statistical model that produces customised recommendations as to which biosecurity measures could be implemented on a farm to reduce its AMU, including a cost-effectiveness analysis of the recommended measures. AMU and biosecurity data were obtained cross-sectionally in 2014 from 181 broiler farms across nine European countries (Belgium, Bulgaria, Denmark, France, Germany, Italy, the Netherlands, Poland and Spain). Using mixed-effects random forest analysis (Mix-RF), recursive feature elimination was implemented to determine the biosecurity measures that best predicted AMU at the farm level. Subsequently, an algorithm was developed to generate AMU reduction scenarios based on the implementation of these measures. In the final Mix-RF model, 21 factors were present: 10 about internal biosecurity, 8 about external biosecurity and 3 about farm size and productivity, with the latter showing the largest (Gini) importance. Other AMU predictors, in order of importance, were the number of depopulation steps, compliance with a vaccination protocol for non-officially controlled diseases, and requiring visitors to check in before entering the farm. K-means clustering on the proximity matrix of the final Mix-RF model revealed that several measures interacted with each other, indicating that high AMU levels can arise for various reasons depending on the situation. The algorithm utilised the AMU predictive power of biosecurity measures while accounting also for their interactions, representing a first step toward aiding the decision-making process of veterinarians and farmers who are in need of implementing on-farm biosecurity measures to reduce their AMU.
Collapse
Affiliation(s)
- Panagiotis Mallioris
- Division of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands
| | - Gijs Teunis
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands
| | - Giske Lagerweij
- National Institute for Public Health and the Environment, Centre for Infectious Disease Control, Bilthoven, the Netherlands
| | - Philip Joosten
- Veterinary Epidemiology Unit, Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Ghent, Belgium
| | - Jeroen Dewulf
- Veterinary Epidemiology Unit, Department of Internal Medicine, Reproduction and Population Medicine, Faculty of Veterinary Medicine, Ghent University, Ghent, Belgium
| | - Jaap A. Wagenaar
- Division of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands
| | - Arjan Stegeman
- Division of Farm Animal Health, Faculty of Veterinary Medicine, Utrecht University, Utrecht, the Netherlands
| | - Lapo Mughini-Gras
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, the Netherlands
- National Institute for Public Health and the Environment, Centre for Infectious Disease Control, Bilthoven, the Netherlands
| |
Collapse
|
22
|
Jardillier R, Koca D, Chatelain F, Guyon L. Optimal microRNA Sequencing Depth to Predict Cancer Patient Survival with Random Forest and Cox Models. Genes (Basel) 2022; 13:2275. [PMID: 36553544 PMCID: PMC9777708 DOI: 10.3390/genes13122275] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Revised: 11/18/2022] [Accepted: 11/23/2022] [Indexed: 12/12/2022] Open
Abstract
(1) Background: tumor profiling enables patient survival prediction. The two essential parameters to be calibrated when designing a study based on tumor profiles from a cohort are the sequencing depth of RNA-seq technology and the number of patients. This calibration is carried out under cost constraints, and a compromise has to be found. In the context of survival data, the goal of this work is to benchmark the impact of the number of patients and of the sequencing depth of miRNA-seq and mRNA-seq on the predictive capabilities for both the Cox model with elastic net penalty and random survival forest. (2) Results: we first show that the Cox model and random survival forest provide comparable prediction capabilities, with significant differences for some cancers. Second, we demonstrate that miRNA and/or mRNA data improve prediction over clinical data alone. mRNA-seq data leads to slightly better prediction than miRNA-seq, with the notable exception of lung adenocarcinoma for which the tumor miRNA profile shows higher predictive power. Third, we demonstrate that the sequencing depth of RNA-seq data can be reduced for most of the investigated cancers without degrading the prediction abilities, allowing the creation of independent validation sets at a lower cost. Finally, we show that the number of patients in the training dataset can be reduced for the Cox model and random survival forest, allowing the use of different models on different patient subgroups.
Collapse
Affiliation(s)
- Rémy Jardillier
- Univ. Grenoble Alpes, CEA, Inserm, IRIG, BioSanté U1292, BCI, 38000 Grenoble, France
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-Lab, Institute of Engineering University Grenoble Alpes, 38000 Grenoble, France
| | - Dzenis Koca
- Univ. Grenoble Alpes, CEA, Inserm, IRIG, BioSanté U1292, BCI, 38000 Grenoble, France
| | - Florent Chatelain
- Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-Lab, Institute of Engineering University Grenoble Alpes, 38000 Grenoble, France
| | - Laurent Guyon
- Univ. Grenoble Alpes, CEA, Inserm, IRIG, BioSanté U1292, BCI, 38000 Grenoble, France
| |
Collapse
|
23
|
Heindel P, Dey T, Feliz JD, Hentschel DM, Bhatt DL, Al-Omran M, Belkin M, Ozaki CK, Hussain MA. Predicting radiocephalic arteriovenous fistula success with machine learning. NPJ Digit Med 2022; 5:160. [PMID: 36280681 PMCID: PMC9592575 DOI: 10.1038/s41746-022-00710-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 10/10/2022] [Indexed: 11/09/2022] Open
Abstract
After creation of a new arteriovenous fistula (AVF), assessment of readiness for use is an important clinical task. Accurate prediction of successful use is challenging, and augmentation of the physical exam with ultrasound has become routine. Herein, we propose a point-of-care tool based on machine learning to enhance prediction of successful unassisted radiocephalic arteriovenous fistula (AVF) use. Our analysis includes pooled patient-level data from 704 patients undergoing new radiocephalic AVF creation, eligible for hemodialysis, and enrolled in the 2014-2019 international multicenter PATENCY-1 or PATENCY-2 randomized controlled trials. The primary outcome being predicted is successful unassisted AVF use within 1-year, defined as 2-needle cannulation for hemodialysis for ≥90 days without preceding intervention. Logistic, penalized logistic (lasso and elastic net), decision tree, random forest, and boosted tree classification models were built with a training, tuning, and testing paradigm using a combination of baseline clinical characteristics and 4-6 week ultrasound parameters. Performance assessment includes receiver operating characteristic curves, precision-recall curves, calibration plots, and decision curves. All modeling approaches except the decision tree have similar discrimination performance and comparable net-benefit (area under the ROC curve 0.78-0.81, accuracy 69.1-73.6%). Model performance is superior to Kidney Disease Outcome Quality Initiative and University of Alabama at Birmingham ultrasound threshold criteria. The lasso model is presented as the final model due to its parsimony, retaining only 3 covariates: larger outflow vein diameter, higher flow volume, and absence of >50% luminal stenosis. A point-of-care online calculator is deployed to facilitate AVF assessment in the clinic.
Collapse
Affiliation(s)
- Patrick Heindel
- grid.38142.3c000000041936754XDivision of Vascular and Endovascular Surgery, Department of Surgery, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA USA ,grid.62560.370000 0004 0378 8294Center for Surgery and Public Health, Brigham and Women’s Hospital, Boston, MA USA
| | - Tanujit Dey
- grid.62560.370000 0004 0378 8294Center for Surgery and Public Health, Brigham and Women’s Hospital, Boston, MA USA
| | - Jessica D. Feliz
- grid.38142.3c000000041936754XDivision of Vascular and Endovascular Surgery, Department of Surgery, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA USA ,grid.62560.370000 0004 0378 8294Center for Surgery and Public Health, Brigham and Women’s Hospital, Boston, MA USA
| | - Dirk M. Hentschel
- grid.38142.3c000000041936754XDivision of Renal Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA USA
| | - Deepak L. Bhatt
- grid.38142.3c000000041936754XDivision of Cardiovascular Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA USA
| | - Mohammed Al-Omran
- grid.17063.330000 0001 2157 2938Division of Vascular Surgery and Li Ka Shing Knowledge Institute, St. Michael’s Hospital, University of Toronto, Toronto, ON Canada ,grid.415310.20000 0001 2191 4301Department of Surgery, King Faisal Specialist Hospital and Research Center, Riyadh, Saudi Arabia
| | - Michael Belkin
- grid.38142.3c000000041936754XDivision of Vascular and Endovascular Surgery, Department of Surgery, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA USA
| | - C. Keith Ozaki
- grid.38142.3c000000041936754XDivision of Vascular and Endovascular Surgery, Department of Surgery, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA USA
| | - Mohamad A. Hussain
- grid.38142.3c000000041936754XDivision of Vascular and Endovascular Surgery, Department of Surgery, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA USA ,grid.62560.370000 0004 0378 8294Center for Surgery and Public Health, Brigham and Women’s Hospital, Boston, MA USA
| |
Collapse
|
24
|
Culture and COVID-19-related mortality: a cross-sectional study of 50 countries. J Public Health Policy 2022; 43:413-430. [PMID: 35995942 PMCID: PMC9395903 DOI: 10.1057/s41271-022-00363-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/26/2022] [Indexed: 11/26/2022]
Abstract
Using a cross-sectional sample of 50 countries we investigate the influence of Hofstede’s six-dimensions of culture on COVID-19 related mortality. A multivariable regression model was fitted that controls for health-related, economic- and policy-related variables that have been found to be associated with mortality. We included the percentage of population aged 65 and above, the prevalence of relevant co-morbidities, and tobacco use as health-related variables. Economic variables were GDP, and the connectedness of a country. As policy variables, the Oxford Stringency Index as well as stringency speed, and the Global Health Security Index were used. We also describe the importance of the variables by means of a random forest model. The results suggest that individualistic societies are associated with lower COVID-19-related mortality rates. This finding contradicts previous studies that supported the popular narrative that collectivistic societies with an obedient population are better positioned to manage the pandemic.
Collapse
|
25
|
Hornung R, Boulesteix AL. Interaction forests: Identifying and exploiting interpretable quantitative and qualitative interaction effects. Comput Stat Data Anal 2022. [DOI: 10.1016/j.csda.2022.107460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
26
|
|
27
|
A Novel Algorithm to Estimate the Significance Level of a Feature Interaction Using the Extreme Gradient Boosting Machine. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022; 19:ijerph19042338. [PMID: 35206527 PMCID: PMC8871671 DOI: 10.3390/ijerph19042338] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 02/16/2022] [Accepted: 02/17/2022] [Indexed: 02/04/2023]
Abstract
Recent studies have revealed the importance of the interaction effect in cardiac research. An analysis would lead to an erroneous conclusion when the approach failed to tackle a significant interaction. Regression models deal with interaction by adding the product of the two interactive variables. Thus, statistical methods could evaluate the significance and contribution of the interaction term. However, machine learning strategies could not provide the p-value of specific feature interaction. Therefore, we propose a novel machine learning algorithm to assess the p-value of a feature interaction, named the extreme gradient boosting machine for feature interaction (XGB-FI). The first step incorporates the concept of statistical methodology by stratifying the original data into four subgroups according to the two interactive features. The second step builds four XGB machines with cross-validation techniques to avoid overfitting. The third step calculates a newly defined feature interaction ratio (FIR) for all possible combinations of predictors. Finally, we calculate the empirical p-value according to the FIR distribution. Computer simulation studies compared the XGB-FI with the multiple regression model with an interaction term. The results showed that the type I error of XGB-FI is valid under the nominal level of 0.05 when there is no interaction effect. The power of XGB-FI is consistently higher than the multiple regression model in all scenarios we examined. In conclusion, the new machine learning algorithm outperforms the conventional statistical model when searching for an interaction.
Collapse
|
28
|
Productivity-Based Land Suitability and Management Sensitivity Analysis: The Eucalyptus E. urophylla × E. grandis Case. FORESTS 2022. [DOI: 10.3390/f13020340] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Eucalyptus plantations are productive and short rotation forests prevalent in tropical areas that experience fast expansion and face controversies in ecological issues. In this study, we perform a systematic analysis of factors influencing eucalyptus growth through plot records from the National Forest Inventories and satellite images. We find primary restricting factors for eucalyptus growth via machine learning algorithms with random forests and accumulated local effects plots, as conventional forest growth models are inadequate to calculate the causal effect with the large number of environmental and socioeconomic factors. As a result, despite common belief that temperature affects eucalyptus growth the most, we find that precipitation is the most evident restricting factor for eucalyptus growth. We then identify and rank key factors that affect timber growth, such as tree density, rotation period, and wood ownership. Finally, we suggest optimal management and planting strategies for local farmers and policymakers to facilitate eucalyptus growth.
Collapse
|
29
|
Nasejje JB, Mbuvha R, Mwambi H. Use of a deep learning and random forest approach to track changes in the predictive nature of socioeconomic drivers of under-5 mortality rates in sub-Saharan Africa. BMJ Open 2022; 12:e049786. [PMID: 35177443 PMCID: PMC8860054 DOI: 10.1136/bmjopen-2021-049786] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 01/13/2022] [Indexed: 11/12/2022] Open
Abstract
OBJECTIVES We used machine learning algorithms to track how the ranks of importance and the survival outcome of four socioeconomic determinants (place of residence, mother's level of education, wealth index and sex of the child) of under-5 mortality rate (U5MR) in sub-Saharan Africa have evolved. SETTINGS This work consists of multiple cross-sectional studies. We analysed data from the Demographic Health Surveys (DHS) collected from four countries; Uganda, Zimbabwe, Chad and Ghana, each randomly selected from the four subregions of sub-Saharan Africa. PARTICIPANTS Each country has multiple DHS datasets and a total of 11 datasets were selected for analysis. A total of n=85 688 children were drawn from the eleven datasets. PRIMARY AND SECONDARY OUTCOMES The primary outcome variable is U5MR; the secondary outcomes were to obtain the ranks of importance of the four socioeconomic factors over time and to compare the two machine learning models, the random survival forest (RSF) and the deep survival neural network (DeepSurv) in predicting U5MR. RESULTS Mother's education level ranked first in five datasets. Wealth index ranked first in three, place of residence ranked first in two and sex of the child ranked last in most of the datasets. The four factors showed a favourable survival outcome over time, confirming that past interventions targeting these factors are yielding positive results. The DeepSurv model has a higher predictive performance with mean concordance indexes (between 67% and 80%), above 50% compared with the RSF model. CONCLUSIONS The study reveals that children under the age of 5 in sub-Saharan Africa have favourable survival outcomes associated with the four socioeconomic factors over time. It also shows that deep survival neural network models are efficient in predicting U5MR and should, therefore, be used in the big data era to draft evidence-based policies to achieve the third sustainable development goal.
Collapse
Affiliation(s)
- Justine B Nasejje
- Statistics and Actuarial Science, University of the Witwatersrand, Johannesburg-Braamfontein, South Africa
| | - Rendani Mbuvha
- Statistics and Actuarial Science, University of the Witwatersrand, Johannesburg-Braamfontein, South Africa
| | - Henry Mwambi
- School of Mathematics, Statistics and Computer Science, University of Kwazulu-Natal, Pietermaritzburg, South Africa
| |
Collapse
|
30
|
Walakira A, Ocira J, Duroux D, Fouladi R, Moškon M, Rozman D, Van Steen K. Detecting gene-gene interactions from GWAS using diffusion kernel principal components. BMC Bioinformatics 2022; 23:57. [PMID: 35105309 PMCID: PMC8805268 DOI: 10.1186/s12859-022-04580-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 01/18/2022] [Indexed: 11/10/2022] Open
Abstract
Genes and gene products do not function in isolation but as components of complex networks of macromolecules through physical or biochemical interactions. Dependencies of gene mutations on genetic background (i.e., epistasis) are believed to play a role in understanding molecular underpinnings of complex diseases such as inflammatory bowel disease (IBD). However, the process of identifying such interactions is complex due to for instance the curse of high dimensionality, dependencies in the data and non-linearity. Here, we propose a novel approach for robust and computationally efficient epistasis detection. We do so by first reducing dimensionality, per gene via diffusion kernel principal components (kpc). Subsequently, kpc gene summaries are used for downstream analysis including the construction of a gene-based epistasis network. We show that our approach is not only able to recover known IBD associated genes but also additional genes of interest linked to this difficult gastrointestinal disease.
Collapse
Affiliation(s)
- Andrew Walakira
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Junior Ocira
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Diane Duroux
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Ramouna Fouladi
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
| | - Miha Moškon
- Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia
| | - Damjana Rozman
- Centre for Functional Genomics and Bio-Chips, Institute for Biochemistry and Molecular Genetics, Faculty of Medicine, University of Ljubljana, Ljubljana, Slovenia
| | - Kristel Van Steen
- BIO3 - Laboratory for Systems Genetics, GIGA-R Medical Genomics, University of Liège, Liège, Belgium
- BIO3 - Laboratory for Systems Medicine, Department of Human Genetics, KU Leuven, Leuven, Belgium
| |
Collapse
|
31
|
Zhang L, Wang Y, Chen J, Chen J. RFtest: A Robust and Flexible Community-Level Test for Microbiome Data Powerfully Detects Phylogenetically Clustered Signals. Front Genet 2022; 12:749573. [PMID: 35140735 PMCID: PMC8819960 DOI: 10.3389/fgene.2021.749573] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 11/09/2021] [Indexed: 12/31/2022] Open
Abstract
Random forest is considered as one of the most successful machine learning algorithms, which has been widely used to construct microbiome-based predictive models. However, its use as a statistical testing method has not been explored. In this study, we propose “Random Forest Test” (RFtest), a global (community-level) test based on random forest for high-dimensional and phylogenetically structured microbiome data. RFtest is a permutation test using the generalization error of random forest as the test statistic. Our simulations demonstrate that RFtest has controlled type I error rates, that its power is superior to competing methods for phylogenetically clustered signals, and that it is robust to outliers and adaptive to interaction effects and non-linear associations. Finally, we apply RFtest to two real microbiome datasets to ascertain whether microbial communities are associated or not with the outcome variables.
Collapse
Affiliation(s)
- Lujun Zhang
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, United States
- Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou, China
| | - Yanshan Wang
- Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA, United States
| | - Jingwen Chen
- Department of General Surgery, Zhongshan Hospital, Fudan University, Shanghai, China
- *Correspondence: Jingwen Chen, ; Jun Chen,
| | - Jun Chen
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- *Correspondence: Jingwen Chen, ; Jun Chen,
| |
Collapse
|
32
|
Inglis A, Parnell A, Hurley CB. Visualizing Variable Importance and Variable Interaction Effects in Machine Learning Models. J Comput Graph Stat 2022. [DOI: 10.1080/10618600.2021.2007935] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Affiliation(s)
- Alan Inglis
- Hamilton Institute, Maynooth University, Maynooth, Ireland
| | - Andrew Parnell
- Hamilton Institute, Insight Centre for Data Analytics, Maynooth University, Maynooth, Ireland
| | - Catherine B. Hurley
- Department of Mathematics and Statistics, Maynooth University, Maynooth, Ireland
| |
Collapse
|
33
|
Koch TK, Romero P, Stachl C. Age and gender in language, emoji, and emoticon usage in instant messages. COMPUTERS IN HUMAN BEHAVIOR 2022. [DOI: 10.1016/j.chb.2021.106990] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
34
|
Nguyen HKD, Fielding MW, Buettel JC, Brook BW. Predicting spatial and seasonal patterns of wildlife–vehicle collisions in high-risk areas†. WILDLIFE RESEARCH 2022. [DOI: 10.1071/wr21018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
35
|
Ohanyan H, Portengen L, Huss A, Traini E, Beulens JWJ, Hoek G, Lakerveld J, Vermeulen R. Machine learning approaches to characterize the obesogenic urban exposome. ENVIRONMENT INTERNATIONAL 2022; 158:107015. [PMID: 34991269 DOI: 10.1016/j.envint.2021.107015] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2021] [Revised: 11/26/2021] [Accepted: 11/29/2021] [Indexed: 06/14/2023]
Abstract
BACKGROUND Characteristics of the urban environment may contain upstream drivers of obesity. However, research is lacking that considers the combination of environmental factors simultaneously. OBJECTIVES We aimed to explore what environmental factors of the urban exposome are related to body mass index (BMI), and evaluated the consistency of findings across multiple statistical approaches. METHODS A cross-sectional analysis was conducted using baseline data from 14,829 participants of the Occupational and Environmental Health Cohort study. BMI was obtained from self-reported height and weight. Geocoded exposures linked to individual home addresses (using 6-digit postcode) of 86 environmental factors were estimated, including air pollution, traffic noise, green-space, built environmental and neighborhood socio-demographic characteristics. Exposure-obesity associations were identified using the following approaches: sparse group Partial Least Squares, Bayesian Model Averaging, penalized regression using the Minimax Concave Penalty, Generalized Additive Model-based boosting Random Forest, Extreme Gradient Boosting, and Multiple Linear Regression, as the most conventional approach. The models were adjusted for individual socio-demographic variables. Environmental factors were ranked according to variable importance scores attributed by each approach and median ranks were calculated across these scores to identify the most consistent associations. RESULTS The most consistent environmental factors associated with BMI were the average neighborhood value of the homes, oxidative potential of particulate matter air pollution (OP), healthy food outlets in the neighborhood (5 km buffer), low-income neighborhoods, and one-person households in the neighborhood. Higher BMI levels were observed in low-income neighborhoods, with lower average house values, lower share of one-person households and smaller amount of healthy food retailers. Higher BMI levels were observed in low-income neighborhoods, with lower average house values, lower share of one-person households, smaller amounts of healthy food retailers and higher OP levels. Across the approaches, we observed consistent patterns of results based on model's capacity to incorporate linear or nonlinear associations. DISCUSSION The pluralistic analysis on environmental obesogens strengthens the existing evidence on the role of neighborhood socioeconomic position, urbanicity and air pollution.
Collapse
Affiliation(s)
- Haykanush Ohanyan
- Department of Epidemiology and Data Science, Amsterdam Public Health Research Institute, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, Noord-Holland, the Netherlands; Institute for Risk Assessment Sciences, Utrecht University, Utrecht, Utrecht, the Netherlands; Upstream Team, www.upstreamteam.nl. Amsterdam UMC, VU University Amsterdam, Amsterdam, Noord-Holland, the Netherlands.
| | - Lützen Portengen
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, Utrecht, the Netherlands
| | - Anke Huss
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, Utrecht, the Netherlands
| | - Eugenio Traini
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, Utrecht, the Netherlands
| | - Joline W J Beulens
- Department of Epidemiology and Data Science, Amsterdam Public Health Research Institute, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, Noord-Holland, the Netherlands; Upstream Team, www.upstreamteam.nl. Amsterdam UMC, VU University Amsterdam, Amsterdam, Noord-Holland, the Netherlands; Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, the Netherland
| | - Gerard Hoek
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, Utrecht, the Netherlands
| | - Jeroen Lakerveld
- Department of Epidemiology and Data Science, Amsterdam Public Health Research Institute, Amsterdam UMC, Vrije Universiteit Amsterdam, Amsterdam, Noord-Holland, the Netherlands; Upstream Team, www.upstreamteam.nl. Amsterdam UMC, VU University Amsterdam, Amsterdam, Noord-Holland, the Netherlands
| | - Roel Vermeulen
- Institute for Risk Assessment Sciences, Utrecht University, Utrecht, Utrecht, the Netherlands
| |
Collapse
|
36
|
Hornung R. Diversity Forests: Using Split Sampling to Enable Innovative Complex Split Procedures in Random Forests. ACTA ACUST UNITED AC 2021; 3:1. [PMID: 34723205 PMCID: PMC8533673 DOI: 10.1007/s42979-021-00920-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 10/02/2021] [Indexed: 11/24/2022]
Abstract
The diversity forest algorithm is an alternative candidate node split sampling scheme that makes innovative complex split procedures in random forests possible. While conventional univariable, binary splitting suffices for obtaining strong predictive performance, new complex split procedures can help tackling practically important issues. For example, interactions between features can be exploited effectively by bivariable splitting. With diversity forests, each split is selected from a candidate split set that is sampled in the following way: for \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$l = 1, \dots , {nsplits}$$\end{document}l=1,⋯,nsplits: (1) sample one split problem; (2) sample a single or few splits from the split problem sampled in (1) and add this or these splits to the candidate split set. The split problems are specifically structured collections of splits that depend on the respective split procedure considered. This sampling scheme makes innovative complex split procedures computationally tangible while avoiding overfitting. Important general properties of the diversity forest algorithm are evaluated empirically using univariable, binary splitting. Based on 220 data sets with binary outcomes, diversity forests are compared with conventional random forests and random forests using extremely randomized trees. It is seen that the split sampling scheme of diversity forests does not impair the predictive performance of random forests and that the performance is quite robust with regard to the specified nsplits value. The recently developed interaction forests are the first diversity forest method that uses a complex split procedure. Interaction forests allow modeling and detecting interactions between features effectively. Further potential complex split procedures are discussed as an outlook.
Collapse
Affiliation(s)
- Roman Hornung
- Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Marchioninistr. 15, 81377 Munich, Germany
| |
Collapse
|
37
|
Sykes AL, Silva GS, Holtkamp DJ, Mauch BW, Osemeke O, Linhares DCL, Machado G. Interpretable machine learning applied to on-farm biosecurity and porcine reproductive and respiratory syndrome virus. Transbound Emerg Dis 2021; 69:e916-e930. [PMID: 34719136 DOI: 10.1111/tbed.14369] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 09/22/2021] [Accepted: 10/24/2021] [Indexed: 11/28/2022]
Abstract
Effective biosecurity practices in swine production are key in preventing the introduction and dissemination of infectious pathogens. Ideally, on-farm biosecurity practices should be chosen by their impact on bio-containment and bio-exclusion; however, quantitative supporting evidence is often unavailable. Therefore, the development of methodologies capable of quantifying and ranking biosecurity practices according to their efficacy in reducing disease risk has the potential to facilitate better-informed choices of biosecurity practices. Using survey data on biosecurity practices, farm demographics, and previous outbreaks from 139 herds, a set of machine learning algorithms were trained to classify farms by porcine reproductive and respiratory syndrome virus status, depending on their biosecurity practices and farm demographics, to produce a predicted outbreak risk. A novel interpretable machine learning toolkit, MrIML-biosecurity, was developed to benchmark farms and production systems by predicted risk and quantify the impact of biosecurity practices on disease risk at individual farms. By quantifying the variable impact on predicted risk, 50% of 42 variables were associated with fomite spread while 31% were associated with local transmission. Results from machine learning interpretations identified similar results, finding substantial contribution to predicted outbreak risk from biosecurity practices relating to the turnover and number of employees, the surrounding density of swine premises and pigs, the sharing of haul trailers, distance from the public road and farm production type. In addition, the development of individualized biosecurity assessments provides the opportunity to better guide biosecurity implementation on a case-by-case basis. Finally, the flexibility of the MrIML-biosecurity toolkit gives it the potential to be applied to wider areas of biosecurity benchmarking, to address biosecurity weaknesses in other livestock systems and industry-relevant diseases.
Collapse
Affiliation(s)
- Abagael L Sykes
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina, USA
| | - Gustavo S Silva
- Veterinary Diagnostic and Production Animal Medicine Department, College of Veterinary Medicine, Iowa State University, Ames, Iowa, USA
| | - Derald J Holtkamp
- Veterinary Diagnostic and Production Animal Medicine Department, College of Veterinary Medicine, Iowa State University, Ames, Iowa, USA
| | - Broc W Mauch
- Veterinary Diagnostic and Production Animal Medicine Department, College of Veterinary Medicine, Iowa State University, Ames, Iowa, USA
| | - Onyekachukwu Osemeke
- Veterinary Diagnostic and Production Animal Medicine Department, College of Veterinary Medicine, Iowa State University, Ames, Iowa, USA
| | - Daniel C L Linhares
- Veterinary Diagnostic and Production Animal Medicine Department, College of Veterinary Medicine, Iowa State University, Ames, Iowa, USA
| | - Gustavo Machado
- Department of Population Health and Pathobiology, College of Veterinary Medicine, North Carolina State University, Raleigh, North Carolina, USA
| |
Collapse
|
38
|
DiMucci D, Kon M, Segrè D. BowSaw: Inferring Higher-Order Trait Interactions Associated With Complex Biological Phenotypes. Front Mol Biosci 2021; 8:663532. [PMID: 34222331 PMCID: PMC8245782 DOI: 10.3389/fmolb.2021.663532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 05/24/2021] [Indexed: 11/15/2022] Open
Abstract
Machine learning is helping the interpretation of biological complexity by enabling the inference and classification of cellular, organismal and ecological phenotypes based on large datasets, e.g., from genomic, transcriptomic and metagenomic analyses. A number of available algorithms can help search these datasets to uncover patterns associated with specific traits, including disease-related attributes. While, in many instances, treating an algorithm as a black box is sufficient, it is interesting to pursue an enhanced understanding of how system variables end up contributing to a specific output, as an avenue toward new mechanistic insight. Here we address this challenge through a suite of algorithms, named BowSaw, which takes advantage of the structure of a trained random forest algorithm to identify combinations of variables (“rules”) frequently used for classification. We first apply BowSaw to a simulated dataset and show that the algorithm can accurately recover the sets of variables used to generate the phenotypes through complex Boolean rules, even under challenging noise levels. We next apply our method to data from the integrative Human Microbiome Project and find previously unreported high-order combinations of microbial taxa putatively associated with Crohn’s disease. By leveraging the structure of trees within a random forest, BowSaw provides a new way of using decision trees to generate testable biological hypotheses.
Collapse
Affiliation(s)
- Demetrius DiMucci
- Bioinformatics Graduate Program, Boston University, Boston, MA, United States.,Biological Design Center, Boston University, Boston, MA, United States
| | - Mark Kon
- Bioinformatics Graduate Program, Boston University, Boston, MA, United States.,Department of Mathematics and Statistics, Boston University, Boston, MA, United States
| | - Daniel Segrè
- Bioinformatics Graduate Program, Boston University, Boston, MA, United States.,Biological Design Center, Boston University, Boston, MA, United States.,Department of Biology, Boston University, Boston, MA, United States.,Department of Biomedical Engineering, Boston University, Boston, MA, United States.,Department of Physics, Boston University, Boston, MA, United States
| |
Collapse
|
39
|
Hamlet A, Ramos DG, Gaythorpe KAM, Romano APM, Garske T, Ferguson NM. Seasonality of agricultural exposure as an important predictor of seasonal yellow fever spillover in Brazil. Nat Commun 2021; 12:3647. [PMID: 34131128 PMCID: PMC8206143 DOI: 10.1038/s41467-021-23926-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 05/24/2021] [Indexed: 01/04/2023] Open
Abstract
Yellow fever virus (YFV) is a zoonotic arbovirus affecting both humans and non-human primates (NHP's) in Africa and South America. Previous descriptions of YF's seasonality have relied purely on climatic explanations, despite the high proportion of cases occurring in people involved in agriculture. We use a series of random forest classification models to predict the monthly occurrence of YF in humans and NHP's across Brazil, by fitting four classes of covariates related to the seasonality of climate and agriculture (planting and harvesting), crop output and host demography. We find that models captured seasonal YF reporting in humans and NHPs when they considered seasonality of agriculture rather than climate, particularly for monthly aggregated reports. These findings illustrate the seasonality of exposure, through agriculture, as a component of zoonotic spillover. Additionally, by highlighting crop types and anthropogenic seasonality, these results could directly identify areas at highest risk of zoonotic spillover.
Collapse
Affiliation(s)
- Arran Hamlet
- MRC Centre for Global Infectious Disease Analysis; and the Abdul Latif Jameel Institute for Disease and Emergency Analytics, School of Public Health, Imperial College London, London, UK.
| | | | - Katy A M Gaythorpe
- MRC Centre for Global Infectious Disease Analysis; and the Abdul Latif Jameel Institute for Disease and Emergency Analytics, School of Public Health, Imperial College London, London, UK
| | | | - Tini Garske
- MRC Centre for Global Infectious Disease Analysis; and the Abdul Latif Jameel Institute for Disease and Emergency Analytics, School of Public Health, Imperial College London, London, UK
| | - Neil M Ferguson
- MRC Centre for Global Infectious Disease Analysis; and the Abdul Latif Jameel Institute for Disease and Emergency Analytics, School of Public Health, Imperial College London, London, UK
| |
Collapse
|
40
|
Askland KD, Strong D, Wright MN, Moore JH. The Translational Machine: A novel machine-learning approach to illuminate complex genetic architectures. Genet Epidemiol 2021; 45:485-536. [PMID: 33942369 DOI: 10.1002/gepi.22383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 03/05/2021] [Accepted: 03/23/2021] [Indexed: 11/08/2022]
Abstract
The Translational Machine (TM) is a machine learning (ML)-based analytic pipeline that translates genotypic/variant call data into biologically contextualized features that richly characterize complex variant architectures and permit greater interpretability and biological replication. It also reduces potentially confounding effects of population substructure on outcome prediction. The TM consists of three main components. First, replicable but flexible feature engineering procedures translate genome-scale data into biologically informative features that appropriately contextualize simple variant calls/genotypes within biological and functional contexts. Second, model-free, nonparametric ML-based feature filtering procedures empirically reduce dimensionality and noise of both original genotype calls and engineered features. Third, a powerful ML algorithm for feature selection is used to differentiate risk variant contributions across variant frequency and functional prediction spectra. The TM simultaneously evaluates potential contributions of variants operative under polygenic and heterogeneous models of genetic architecture. Our TM enables integration of biological information (e.g., genomic annotations) within conceptual frameworks akin to geneset-/pathways-based and collapsing methods, but overcomes some of these methods' limitations. The full TM pipeline is executed in R. Our approach and initial findings from its application to a whole-exome schizophrenia case-control data set are presented. These TM procedures extend the findings of the primary investigation and yield novel results.
Collapse
Affiliation(s)
- Kathleen D Askland
- Waypoint Centre for Mental Health Care Penetanguishene, University of Toronto, Toronto, Ontario, Canada
| | - David Strong
- Department of Family Medicine and Public Health, University of California San Diego, San Diego, California, USA
| | - Marvin N Wright
- Department Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS GmbH, Germany
| | - Jason H Moore
- Department of Biostatistics, Epidemiology, & Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
41
|
Epistasis Analysis: Classification Through Machine Learning Methods. Methods Mol Biol 2021. [PMID: 33733366 DOI: 10.1007/978-1-0716-0947-7_21] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Complex disease is different from Mendelian disorders. Its development usually involves the interaction of multiple genes or the interaction between genes and the environment (i.e. epistasis). Although the high-throughput sequencing technologies for complex diseases have produced a large amount of data, it is extremely difficult to analyze the data due to the high feature dimension and the combination in the epistasis analysis. In this work, we introduce machine learning methods to effectively reduce the gene dimensionality, retain the key epistatic effects, and effectively characterize the relationship between epistatic effects and complex diseases.
Collapse
|
42
|
García de la Garza Á, Blanco C, Olfson M, Wall MM. Identification of Suicide Attempt Risk Factors in a National US Survey Using Machine Learning. JAMA Psychiatry 2021; 78:398-406. [PMID: 33404590 PMCID: PMC7788508 DOI: 10.1001/jamapsychiatry.2020.4165] [Citation(s) in RCA: 74] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
IMPORTANCE Because more than one-third of people making nonfatal suicide attempts do not receive mental health treatment, it is essential to extend suicide attempt risk factors beyond high-risk clinical populations to the general adult population. OBJECTIVE To identify future suicide attempt risk factors in the general population using a data-driven machine learning approach including more than 2500 questions from a large, nationally representative survey of US adults. DESIGN, SETTING, AND PARTICIPANTS Data came from wave 1 (2001 to 2002) and wave 2 (2004 to 2005) of the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC). NESARC is a face-to-face longitudinal survey conducted with a national representative sample of noninstitutionalized civilian population 18 years and older in the US. The cumulative response rate across both waves was 70.2% resulting in 34 653 wave 2 interviews. A balanced random forest was trained using cross-validation to develop a suicide attempt risk model. Out-of-fold model prediction was used to assess model performance, including the area under the receiver operator curve, sensitivity, and specificity. Survey design and nonresponse weights allowed estimates to be representative of the US civilian population based on the 2000 census. Analyses were performed between May 15, 2019, and June 10, 2020. MAIN OUTCOMES AND MEASURES Attempted suicide in the 3 years between wave 1 and wave 2 interviews. RESULTS Of 34 653 participants, 20 089 were female (weighted proportion, 52.1%). The weighted mean (SD) age was 45.1 (17.3) years at wave 1 and 48.2 (17.3) years at wave 2. Attempted suicide during the 3 years between wave 1 and wave 2 interviews was self-reported by 222 of 34 653 participants (0.6%). Using survey questions measured at wave 1, the suicide attempt risk model yielded a cross-validated area under the receiver operator characteristic curve of 0.857 with a sensitivity of 85.3% (95% CI, 79.8-89.7) and a specificity of 73.3% (95% CI, 72.8-73.8) at an optimized threshold. The model identified 1.8% of the US population to be at a 10% or greater risk of suicide attempt. The most important risk factors were 3 questions about previous suicidal ideation or behavior; 3 items from the 12-Item Short Form Health Survey, namely feeling downhearted, doing activities less carefully, or accomplishing less because of emotional problems; younger age; lower educational achievement; and recent financial crisis. CONCLUSIONS AND RELEVANCE In this study, after searching through more than 2500 survey questions, several well-known risk factors of suicide attempt were confirmed, such as previous suicidal behaviors and ideation, and new risks were identified, including functional impairment resulting from mental disorders and socioeconomic disadvantage. These results may help guide future clinical assessment and the development of new suicide risk scales.
Collapse
Affiliation(s)
| | - Carlos Blanco
- Division of Epidemiology, Services and Prevention Research, National Institute on Drug Abuse, Bethesda, Maryland
| | - Mark Olfson
- Department of Psychiatry, New York State Psychiatric Institute, Columbia University Medical Center, New York
| | - Melanie M. Wall
- Department of Biostatistics, Columbia University, New York, New York,Department of Psychiatry, New York State Psychiatric Institute, Columbia University Medical Center, New York
| |
Collapse
|
43
|
Gola D, König IR. Empowering individual trait prediction using interactions for precision medicine. BMC Bioinformatics 2021; 22:74. [PMID: 33602124 PMCID: PMC7890638 DOI: 10.1186/s12859-021-04011-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 02/08/2021] [Indexed: 11/11/2022] Open
Abstract
Background One component of precision medicine is to construct prediction models with their predicitve ability as high as possible, e.g. to enable individual risk prediction. In genetic epidemiology, complex diseases like coronary artery disease, rheumatoid arthritis, and type 2 diabetes, have a polygenic basis and a common assumption is that biological and genetic features affect the outcome under consideration via interactions. In the case of omics data, the use of standard approaches such as generalized linear models may be suboptimal and machine learning methods are appealing to make individual predictions. However, most of these algorithms focus mostly on main or marginal effects of the single features in a dataset. On the other hand, the detection of interacting features is an active area of research in the realm of genetic epidemiology. One big class of algorithms to detect interacting features is based on the multifactor dimensionality reduction (MDR). Here, we further develop the model-based MDR (MB-MDR), a powerful extension of the original MDR algorithm, to enable interaction empowered individual prediction. Results Using a comprehensive simulation study we show that our new algorithm (median AUC: 0.66) can use information hidden in interactions and outperforms two other state-of-the-art algorithms, namely the Random Forest (median AUC: 0.54) and Elastic Net (median AUC: 0.50), if interactions are present in a scenario of two pairs of two features having small effects. The performance of these algorithms is comparable if no interactions are present. Further, we show that our new algorithm is applicable to real data by comparing the performance of the three algorithms on a dataset of rheumatoid arthritis cases and healthy controls. As our new algorithm is not only applicable to biological/genetic data but to all datasets with discrete features, it may have practical implications in other research fields where interactions between features have to be considered as well, and we made our method available as an R package (https://github.com/imbs-hl/MBMDRClassifieR). Conclusions The explicit use of interactions between features can improve the prediction performance and thus should be included in further attempts to move precision medicine forward.
Collapse
Affiliation(s)
- Damian Gola
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany
| | - Inke R König
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany.
| |
Collapse
|
44
|
Orlenko A, Moore JH. A comparison of methods for interpreting random forest models of genetic association in the presence of non-additive interactions. BioData Min 2021; 14:9. [PMID: 33514397 PMCID: PMC7847145 DOI: 10.1186/s13040-021-00243-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 01/13/2021] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Non-additive interactions among genes are frequently associated with a number of phenotypes, including known complex diseases such as Alzheimer's, diabetes, and cardiovascular disease. Detecting interactions requires careful selection of analytical methods, and some machine learning algorithms are unable or underpowered to detect or model feature interactions that exhibit non-additivity. The Random Forest method is often employed in these efforts due to its ability to detect and model non-additive interactions. In addition, Random Forest has the built-in ability to estimate feature importance scores, a characteristic that allows the model to be interpreted with the order and effect size of the feature association with the outcome. This characteristic is very important for epidemiological and clinical studies where results of predictive modeling could be used to define the future direction of the research efforts. An alternative way to interpret the model is with a permutation feature importance metric which employs a permutation approach to calculate a feature contribution coefficient in units of the decrease in the model's performance and with the Shapely additive explanations which employ cooperative game theory approach. Currently, it is unclear which Random Forest feature importance metric provides a superior estimation of the true informative contribution of features in genetic association analysis. RESULTS To address this issue, and to improve interpretability of Random Forest predictions, we compared different methods for feature importance estimation in real and simulated datasets with non-additive interactions. As a result, we detected a discrepancy between the metrics for the real-world datasets and further established that the permutation feature importance metric provides more precise feature importance rank estimation for the simulated datasets with non-additive interactions. CONCLUSIONS By analyzing both real and simulated data, we established that the permutation feature importance metric provides more precise feature importance rank estimation in the presence of non-additive interactions.
Collapse
Affiliation(s)
- Alena Orlenko
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Jason H Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
45
|
Martins AS, Neves LA, de Faria PR, Tosta TAA, Longo LC, Silva AB, Roberto GF, do Nascimento MZ. A Hermite polynomial algorithm for detection of lesions in lymphoma images. Pattern Anal Appl 2020. [DOI: 10.1007/s10044-020-00927-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
46
|
Affiliation(s)
- Tim C. D. Lucas
- Big Data Institute University of Oxford Old Road Campus Oxford OX3 7LF United Kingdom
| |
Collapse
|
47
|
McWilliam A, Khalifa J, Vasquez Osorio E, Banfill K, Abravan A, Faivre-Finn C, van Herk M. Novel Methodology to Investigate the Effect of Radiation Dose to Heart Substructures on Overall Survival. Int J Radiat Oncol Biol Phys 2020; 108:1073-1081. [PMID: 32585334 DOI: 10.1016/j.ijrobp.2020.06.031] [Citation(s) in RCA: 82] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 05/18/2020] [Accepted: 06/17/2020] [Indexed: 12/25/2022]
Abstract
PURPOSE For patients with lung cancer treated with radiation therapy, a dose to the heart is associated with excess mortality; however, it is often not feasible to spare the whole heart. Our aim is to define cardiac substructures and dose thresholds that optimally reduce early mortality. METHODS AND MATERIALS Fourteen cardiac substructures were delineated on 5 template patients with representative anatomies. One thousand one hundred sixty-one patients with non-small cell lung cancer were registered nonrigidly to these 5 template anatomies, and their radiation therapy doses were mapped. Mean and maximum dose to each substructure were extracted, and the means were evaluated as input to prediction models. The cohort was bootstrapped into 2 variable reduction techniques: elastic net least absolute shrinkage and selection operator and the random survival forest model. Each method was optimized to extract variables contributing most to overall survival, and model coefficients were evaluated to select these substructures. The most important variables common to both models were selected and evaluated in multivariable Cox-proportional hazard models. A threshold dose was defined, and Kaplan-Meier survival curves plotted. RESULTS Nine hundred seventy-eight patients remained after visual quality assurance of the registration. Ranking the model coefficients across the bootstraps selected the maximum dose to the right atrium, right coronary artery, and ascending aorta as the most important factors associated with survival. The maximum dose to the combined cardiac region showed significance in the multivariable model, a hazard ratio of 1.01/Gy, and P = .03 after accounting for tumor volume (P < .001), N stage (P < .01), and performance status (P = .01). The optimal threshold for the maximum dose, equivalent dose in 2-Gy fractions, was 23 Gy. Kaplan-Meier survival curves showed a significant split (log-rank P = .008). CONCLUSIONS The maximum dose to the combined cardiac region encompassing the right atrium, right coronary artery, and ascending aorta was found to have the greatest effect on patient survival. A maximum equivalent dose in 2-Gy fractions of 23 Gy was identified for consideration as a dose limit in future studies.
Collapse
Affiliation(s)
- Alan McWilliam
- Division of Clinical Cancer Science, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom; Department of Radiotherapy Related Research, The Christie NHS Foundation Trust, Manchester, United Kingdom.
| | - Jonathan Khalifa
- Department of Radiation Oncology, Institut Universitaire du Cancer de Toulouse, Toulouse, France
| | - Eliana Vasquez Osorio
- Division of Clinical Cancer Science, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom; Department of Radiotherapy Related Research, The Christie NHS Foundation Trust, Manchester, United Kingdom
| | - Kathryn Banfill
- Division of Clinical Cancer Science, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom; Department of Radiotherapy Related Research, The Christie NHS Foundation Trust, Manchester, United Kingdom
| | - Azadeh Abravan
- Division of Clinical Cancer Science, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom; Department of Radiotherapy Related Research, The Christie NHS Foundation Trust, Manchester, United Kingdom
| | - Corinne Faivre-Finn
- Division of Clinical Cancer Science, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom; Department of Radiotherapy Related Research, The Christie NHS Foundation Trust, Manchester, United Kingdom
| | - Marcel van Herk
- Division of Clinical Cancer Science, School of Medical Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, United Kingdom; Department of Radiotherapy Related Research, The Christie NHS Foundation Trust, Manchester, United Kingdom
| |
Collapse
|
48
|
Malten J, König IR. Modified entropy-based procedure detects gene-gene-interactions in unconventional genetic models. BMC Med Genomics 2020; 13:65. [PMID: 32326960 PMCID: PMC7181579 DOI: 10.1186/s12920-020-0703-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 03/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Since it is assumed that genetic interactions play an important role in understanding the mechanisms of complex diseases, different statistical approaches have been suggested in recent years for this task. One interesting approach is the entropy-based IGENT method by Kwon et al. that promises an efficient detection of main effects and interaction effects simultaneously. However, a modification is required if the aim is to only detect interaction effects. METHODS Based on the IGENT method, we present a modification that leads to a conditional mutual information based approach under the condition of linkage equilibrium. The modified estimator is investigated in a comprehensive simulation based on five genetic interaction models and applied to real data from the genome-wide association study by the North American Rheumatoid Arthritis Consortium (NARAC). RESULTS The presented modification of IGENT controls the type I error in all simulated constellations. Furthermore, it provides high power for detecting pure interactions specifically on unconventional genetic models both in simulation and real data. CONCLUSIONS The proposed method uses the IGENT software, which is free available, simple and fast, and detects pure interactions on unconventional genetic models. Our results demonstrate that this modification is an attractive complement to established analysis methods.
Collapse
Affiliation(s)
- Jörg Malten
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany
| | - Inke R König
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Ratzeburger Allee 160, 23562 Lübeck, Germany.
| |
Collapse
|
49
|
Gola D, Erdmann J, Müller‐Myhsok B, Schunkert H, König IR. Polygenic risk scores outperform machine learning methods in predicting coronary artery disease status. Genet Epidemiol 2020; 44:125-138. [DOI: 10.1002/gepi.22279] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 12/05/2019] [Accepted: 12/23/2019] [Indexed: 02/06/2023]
Affiliation(s)
- Damian Gola
- Institut für Medizinische Biometrie und StatistikUniversität zu Lübeck Lübeck Germany
| | | | - Bertram Müller‐Myhsok
- Department of Translational Research in PsychiatryMax Planck Institute of PsychiatryMunich Germany
| | - Heribert Schunkert
- Deutsches Herzzentrum MünchenTechnische Universität MünchenMünchen Germany
| | - Inke R. König
- Institut für Medizinische Biometrie und StatistikUniversität zu Lübeck Lübeck Germany
| |
Collapse
|
50
|
Abstract
There has been considerable development in machine learning in recent years with some remarkable successes. Although there are many high-performance methods, the interpretation of learning models remains challenging. Understanding the underlying theory behind the specific prediction of various models is difficult. Various studies have attempted to explain the working principle behind learning models using techniques like feature importance, partial dependency, feature interaction, and the Shapley value. This study introduces a new feature interaction measure. While recent studies have measured feature interaction using partial dependency, this study redefines feature interaction in terms of prediction performance. The proposed measure is easy to interpret, faster than partial dependency-based measures, and useful to explain feature interaction, which affects prediction performance in both regression and classification models.
Collapse
|