1
|
Song C, Shi Y, Li M, Wu L, Xiong X, Liu J, Xia D. An efficient catalyst screening strategy combining machine learning and causal inference. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2025; 377:124665. [PMID: 39999759 DOI: 10.1016/j.jenvman.2025.124665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2024] [Revised: 02/10/2025] [Accepted: 02/18/2025] [Indexed: 02/27/2025]
Abstract
Due to the diversity of catalyst synthesis methods, the optimization of catalysts by traditional experimental methods have brought greater challenges. This study presents a new strategy for determining catalyst performance by substituting causal inference results as prior knowledge into machine learning models, which was used to explore the correlation between the ratio of nitrogen functional groups in catalysts and degradation performance, so as to solve the problem of low efficiency in catalyst screening. A dataset comprising 14 critical parameters, including the physicochemical properties of catalysts and reaction conditions, was established through the analysis of 182 experimental results. The analysis results based on real data show that CatBoost model performs best (R2 = 0.953, MAE = 3.277, RMSE = 5.615). SHAP analysis showed that pyridinic N was a key N-functional group that affects the degradation performance of BPA. DoWhy causal inference further verified the positive effect of pyridinic N, with causal effect estimate of 0.4388. This strategy reduces the selection range of the best catalyst through causal inference pre-screening, and used CatBoost model to accurately evaluate the performance of its catalyst, which can reduce the catalyst screening process from multiple processes to a single process, and significantly improve the catalyst selection efficiency.
Collapse
Affiliation(s)
- Chenyu Song
- Engineering Research Center for Clean Production of Textile Dyeing and Printing, Ministry of Education, Wuhan Textile University, Wuhan, 430073, PR China
| | - Yintao Shi
- Engineering Research Center for Clean Production of Textile Dyeing and Printing, Ministry of Education, Wuhan Textile University, Wuhan, 430073, PR China; School of Environmental Engineering, Wuhan Textile University, Wuhan, 430073, PR China.
| | - Meng Li
- Engineering Research Center for Clean Production of Textile Dyeing and Printing, Ministry of Education, Wuhan Textile University, Wuhan, 430073, PR China; Textile Pollution Controlling Engineering Centre of Ministry of Ecology and Environment, College of Environmental Science and Engineering, Donghua University, Shanghai, 201620, PR China
| | - Lin Wu
- Engineering Research Center for Clean Production of Textile Dyeing and Printing, Ministry of Education, Wuhan Textile University, Wuhan, 430073, PR China
| | - Xiaorong Xiong
- School of Computing, Huanggang Normal University, Huanggang, 438000, PR China
| | - Jianyun Liu
- Textile Pollution Controlling Engineering Centre of Ministry of Ecology and Environment, College of Environmental Science and Engineering, Donghua University, Shanghai, 201620, PR China
| | - Dongsheng Xia
- Engineering Research Center for Clean Production of Textile Dyeing and Printing, Ministry of Education, Wuhan Textile University, Wuhan, 430073, PR China.
| |
Collapse
|
2
|
Rieckmann A, Nielsen S, Dworzynski P, Amini H, Mogensen SW, Silva IB, Chang AY, Arah OA, Samek W, Rod NH, Ekstrøm CT, Benn CS, Aaby P, Fisker AB. Discovering Subgroups of Children With High Mortality in Urban Guinea-Bissau: Exploratory and Validation Cohort Study. JMIR Public Health Surveill 2024; 10:e48060. [PMID: 38592761 PMCID: PMC11040440 DOI: 10.2196/48060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 12/22/2023] [Accepted: 01/23/2024] [Indexed: 04/10/2024] Open
Abstract
BACKGROUND The decline in global child mortality is an important public health achievement, yet child mortality remains disproportionally high in many low-income countries like Guinea-Bissau. The persisting high mortality rates necessitate targeted research to identify vulnerable subgroups of children and formulate effective interventions. OBJECTIVE This study aimed to discover subgroups of children at an elevated risk of mortality in the urban setting of Bissau, Guinea-Bissau, West Africa. By identifying these groups, we intend to provide a foundation for developing targeted health interventions and inform public health policy. METHODS We used data from the health and demographic surveillance site, Bandim Health Project, covering 2003 to 2019. We identified baseline variables recorded before children reached the age of 6 weeks. The focus was on determining factors consistently linked with increased mortality up to the age of 3 years. Our multifaceted methodological approach incorporated spatial analysis for visualizing geographical variations in mortality risk, causally adjusted regression analysis to single out specific risk factors, and machine learning techniques for identifying clusters of multifactorial risk factors. To ensure robustness and validity, we divided the data set temporally, assessing the persistence of identified subgroups over different periods. The reassessment of mortality risk used the targeted maximum likelihood estimation (TMLE) method to achieve more robust causal modeling. RESULTS We analyzed data from 21,005 children. The mortality risk (6 weeks to 3 years of age) was 5.2% (95% CI 4.8%-5.6%) for children born between 2003 and 2011, and 2.9% (95% CI 2.5%-3.3%) for children born between 2012 and 2016. Our findings revealed 3 distinct high-risk subgroups with notably higher mortality rates, children residing in a specific urban area (adjusted mortality risk difference of 3.4%, 95% CI 0.3%-6.5%), children born to mothers with no prenatal consultations (adjusted mortality risk difference of 5.8%, 95% CI 2.6%-8.9%), and children from polygamous families born during the dry season (adjusted mortality risk difference of 1.7%, 95% CI 0.4%-2.9%). These subgroups, though small, showed a consistent pattern of higher mortality risk over time. Common social and economic factors were linked to a larger share of the total child deaths. CONCLUSIONS The study's results underscore the need for targeted interventions to address the specific risks faced by these identified high-risk subgroups. These interventions should be designed to work to complement broader public health strategies, creating a comprehensive approach to reducing child mortality. We suggest future research that focuses on developing, testing, and comparing targeted intervention strategies unraveling the proposed hypotheses found in this study. The ultimate aim is to optimize health outcomes for all children in high-mortality settings, leveraging a strategic mix of targeted and general health interventions to address the varied needs of different child subgroups.
Collapse
Affiliation(s)
- Andreas Rieckmann
- Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Sebastian Nielsen
- Bandim Health Project, INDEPTH Network, Bissau, Guinea-Bissau
- Bandim Health Project, Research unit Odense Patient Data Explorative Network (OPEN), Department of Clinical Research, Odense University Hospital and University of Southern Denmark, Odense, Denmark
| | - Piotr Dworzynski
- Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Heresh Amini
- Department of Environmental Medicine and Climate Science, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- Institute for Climate Change, Environmental Health, and Exposomics, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | | | - Isaquel Bartolomeu Silva
- Bandim Health Project, INDEPTH Network, Bissau, Guinea-Bissau
- Bandim Health Project, Research unit Odense Patient Data Explorative Network (OPEN), Department of Clinical Research, Odense University Hospital and University of Southern Denmark, Odense, Denmark
| | - Angela Y Chang
- Danish Institute for Advanced Study, University of Southern Denmark, Odense, Denmark
- The Interdisciplinary Centre on Population Dynamics, University of Southern Denmark, Odense, Denmark
| | - Onyebuchi A Arah
- Department of Epidemiology, Fielding School of Public Health, University of California, Los Angeles, Los Angeles, CA, United States
- Department of Statistics and Data Science, College of Letters and Science, University of California, Los Angeles, Los Angeles, CA, United States
- Research Unit for Epidemiology, Department of Public Health, University of Aarhus, Aarhus, Denmark
| | - Wojciech Samek
- Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, Germany
- Department of Electrical Engineering and Computer Science, Technical University of Berlin, Berlin, Germany
- Berlin Institute for the Foundations of Learning and Data, Berlin, Germany
| | - Naja Hulvej Rod
- Section of Epidemiology, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Claus Thorn Ekstrøm
- Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Christine Stabell Benn
- Bandim Health Project, INDEPTH Network, Bissau, Guinea-Bissau
- Bandim Health Project, Research unit Odense Patient Data Explorative Network (OPEN), Department of Clinical Research, Odense University Hospital and University of Southern Denmark, Odense, Denmark
- Danish Institute for Advanced Study, University of Southern Denmark, Odense, Denmark
| | - Peter Aaby
- Bandim Health Project, INDEPTH Network, Bissau, Guinea-Bissau
- Bandim Health Project, Research unit Odense Patient Data Explorative Network (OPEN), Department of Clinical Research, Odense University Hospital and University of Southern Denmark, Odense, Denmark
| | - Ane Bærent Fisker
- Bandim Health Project, INDEPTH Network, Bissau, Guinea-Bissau
- Bandim Health Project, Research unit Odense Patient Data Explorative Network (OPEN), Department of Clinical Research, Odense University Hospital and University of Southern Denmark, Odense, Denmark
| |
Collapse
|
3
|
Hannigan C, Kelly M, Holton E, Lawlor B, Scharf T, Kee F, Moynihan S, O’Reilly A, McHugh Power J. Mechanisms through which befriending services may impact the health of older adults: A dyadic qualitative investigation. J Health Psychol 2024; 30:13591053241235846. [PMID: 38439512 PMCID: PMC11686925 DOI: 10.1177/13591053241235846] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2024] Open
Abstract
Befriending services are often delivered to older adults with a view to improving social connectedness, but they may also lead to improved health. The objective of the current study was to explore potential mechanisms through which befriending services might impact the health of older adults. Data were collected from 13 befriendee-befriender dyads (n = 26), using a constructivist grounded theory and dyadic analytic approach. Potential mechanisms were described, using a realist evaluative framework of mechanistic processes in a complex intervention context. Five mechanisms of action triggered by the intervention were identified: supporting health behaviours; providing emotional support; improving mood; getting cognitive stimulation and novelty; and providing opportunities for socialising. We identified five potential mechanisms through which befriending services might impact health for older people. Our results suggest potential mechanisms through which befriending might positively impact the health of older people, and which should be evaluated empirically in future research.
Collapse
|
4
|
Wibaek R, Andersen GS, Dahm CC, Witte DR, Hulman A. Large Language Models for Epidemiological Research via Automated Machine Learning: Case Study Using Data From the British National Child Development Study. JMIR Med Inform 2023; 11:e43638. [PMID: 37787655 PMCID: PMC10547934 DOI: 10.2196/43638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 06/29/2023] [Accepted: 07/22/2023] [Indexed: 10/04/2023] Open
Abstract
Background Large language models have had a huge impact on natural language processing (NLP) in recent years. However, their application in epidemiological research is still limited to the analysis of electronic health records and social media data. objectives To demonstrate the potential of NLP beyond these domains, we aimed to develop prediction models based on texts collected from an epidemiological cohort and compare their performance to classical regression methods. Methods We used data from the British National Child Development Study, where 10,567 children aged 11 years wrote essays about how they imagined themselves as 25-year-olds. Overall, 15% of the data set was set aside as a test set for performance evaluation. Pretrained language models were fine-tuned using AutoTrain (Hugging Face) to predict current reading comprehension score (range: 0-35) and future BMI and physical activity (active vs inactive) at the age of 33 years. We then compared their predictive performance (accuracy or discrimination) with linear and logistic regression models, including demographic and lifestyle factors of the parents and children from birth to the age of 11 years as predictors. Results NLP clearly outperformed linear regression when predicting reading comprehension scores (root mean square error: 3.89, 95% CI 3.74-4.05 for NLP vs 4.14, 95% CI 3.98-4.30 and 5.41, 95% CI 5.23-5.58 for regression models with and without general ability score as a predictor, respectively). Predictive performance for physical activity was similarly poor for the 2 methods (area under the receiver operating characteristic curve: 0.55, 95% CI 0.52-0.60 for both) but was slightly better than random assignment, whereas linear regression clearly outperformed the NLP approach when predicting BMI (root mean square error: 4.38, 95% CI 4.02-4.74 for NLP vs 3.85, 95% CI 3.54-4.16 for regression). The NLP approach did not perform better than simply assigning the mean BMI from the training set as a predictor. Conclusions Our study demonstrated the potential of using large language models on text collected from epidemiological studies. The performance of the approach appeared to depend on how directly the topic of the text was related to the outcome. Open-ended questions specifically designed to capture certain health concepts and lived experiences in combination with NLP methods should receive more attention in future epidemiological studies.
Collapse
Affiliation(s)
| | | | | | - Daniel R Witte
- Department of Public Health, Aarhus University, Aarhus, Denmark
- Steno Diabetes Center Aarhus, Aarhus University Hospital, Aarhus, Denmark
| | - Adam Hulman
- Department of Public Health, Aarhus University, Aarhus, Denmark
- Steno Diabetes Center Aarhus, Aarhus University Hospital, Aarhus, Denmark
| |
Collapse
|
5
|
Abstract
OBJECTIVES The objective of this study is to highlight innovative research and contemporary trends in the area of Public Health and Epidemiology Informatics (PHEI). METHODS Following a similar approach to last year's edition, a meticulous search was conducted on PubMed (with keywords including topics related to Public Health, Epidemiological Surveillance and Medical Informatics), examining a total of 2,022 scientific publications on Public Health and Epidemiology Informatics (PHEI). The resulting references were thoroughly examined by the three section editors. Subsequently, 10 papers were chosen as potential candidates for the best paper award. These selected papers were then subjected to peer-review by six external reviewers, in addition to the section editors and two chief editors of the IMIA yearbook of medical informatics. Each paper underwent a total of five reviews. RESULTS Out of the 539 references retrieved from PubMed, only two were deemed worthy of the best paper award, although four papers had the potential to qualify in total. The first best paper by pertains to a study about the need for a new annotation framework due to inadequacies in existing methods and resources. The second paper elucidates the use of Weibo data to monitor the health of Chinese urbanites. The correlation between air pollution and health sensing was measured via generalized additive models. CONCLUSIONS One of the primary findings of this edition is the dearth of studies identified for the PHEI section, which represents a significant decline compared to the previous edition. This is particularly surprising given that the post-COVID period should have led to an increased use of information and communication technology for public health issues.
Collapse
Affiliation(s)
- Gayo Diallo
- Univ. Bordeaux, Inserm, BPH, U1219, F-33000 Bordeaux, France
| | - Georgeta Bordea
- Univ. La Rochelle, L3i, EA 2118, F-17000, La Rochelle, France
| | - Cécilia Samieri
- Univ. Bordeaux, Inserm, BPH, U1219, F-33000 Bordeaux, France
| | | |
Collapse
|