1
|
Widmer JA, Stocker M, Smith JE, Coffin A, Pisani O, Strickland T, Sharma M, Pachepsky Y, Dunn LL. Spatiotemporal trends of Escherichia coli levels and their influences vary among ponds in the coastal plain of Georgia. JOURNAL OF ENVIRONMENTAL QUALITY 2025; 54:647-661. [PMID: 40164960 PMCID: PMC12065067 DOI: 10.1002/jeq2.70018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Accepted: 02/18/2025] [Indexed: 04/02/2025]
Abstract
Quantification of Escherichia coli in water is commonly used to understand a surface source's suitability for produce irrigation. Location, season, and physicochemical water quality impact the levels of E. coli in irrigation ponds. Water samples were collected periodically at three ponds in Southeast Georgia along a sampling grid from July 2021 through September 2023 and quantified for E. coli with simultaneous collection of relevant water physicochemical parameters. Mean relative differences (MRDs) were calculated for each collection point to determine differences in E. coli levels across sampling locations. E. coli levels varied significantly across sampling area (perimeter, surface, and subsurface) at each pond. The log most probable number E. coli 100 mL-1 (EC MRD) values ranged from -0.25 to 0.33 in Pond 1, -1.5 to 0.65 in Pond 2, and -1.25 to 0.65 in Pond 3. In Pond 1, EC MRD correlated positively with chlorophyll and turbidity, and negatively with dissolved organic matter, dissolved oxygen (DO), specific conductance, and pH MRDs. In Pond 2, the MRD of E. coli correlated with the MRDs of chlorophyll, DO, phycocyanin, pH, and temperature. In Pond 3, E. coli MRD correlated positively with nitrate MRD. This work showed MRD analysis may reveal stable patterns of E. coli and the physicochemical factors that impact these levels in ponds, though no universal covariates were identified that could estimate E. coli levels. These findings may provide context for water quality managers wishing to augment measurements of E. coli with other factors, or better represent variable E. coli levels with MRD.
Collapse
Affiliation(s)
- J. Andrew Widmer
- Department of Food Science and TechnologyUniversity of GeorgiaAthensGeorgiaUSA
| | - Matthew Stocker
- USDA‐ARSEnvironmental Microbial and Food Safety LaboratoryBeltsvilleMarylandUSA
| | - Jaclyn E. Smith
- USDA‐ARSEnvironmental Microbial and Food Safety LaboratoryBeltsvilleMarylandUSA
| | - Alisa Coffin
- USDA‐ARSSoutheast Watershed Research LaboratoryTiftonGeorgiaUSA
| | - Oliva Pisani
- USDA‐ARSSoutheast Watershed Research LaboratoryTiftonGeorgiaUSA
| | | | - Manan Sharma
- USDA‐ARSEnvironmental Microbial and Food Safety LaboratoryBeltsvilleMarylandUSA
| | - Yakov Pachepsky
- USDA‐ARSEnvironmental Microbial and Food Safety LaboratoryBeltsvilleMarylandUSA
| | - Laurel L. Dunn
- Department of Food Science and TechnologyUniversity of GeorgiaAthensGeorgiaUSA
| |
Collapse
|
2
|
Qian C, Lee RT, Weachock RL, Wiedmann M, Martin NH. A Machine-Learning Approach Reveals That Bacterial Spore Levels in Organic Bulk Tank Milk are Dependent on Farm Characteristics and Meteorological Factors. J Food Prot 2025; 88:100477. [PMID: 40058735 DOI: 10.1016/j.jfp.2025.100477] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 03/02/2025] [Accepted: 03/04/2025] [Indexed: 03/20/2025]
Abstract
Bacterial spores in raw milk can lead to quality issues in milk and milk-derived products. As these spores originate from farm environments, it is important to understand the contributions of farm-level factors to spore levels. This study aimed to investigate the impact of farm management practices and meteorological factors on levels of different spore types in organic raw milk using machine learning models. Raw milk from certified organic dairy farms (n = 102) located across 11 states was collected 6 times over a year and tested for standard plate count, psychrotolerant spore count, mesophilic spore count, thermophilic spore count, and butyric acid bacteria. At each sampling date, a survey about farm management practices was collected and meteorological factors were obtained on the date of sampling as well as 1, 2, and 3 days prior. The dataset was stratified separately based on the use of a parlor for milking, number of years since organic certification, and pasture time into subdatasets to address confounders. We constructed random forest regression models to predict log10 mesophilic spore count, log10 thermophilic spore count, and log10 butyric acid bacteria's most probable number as well as a random forest classification model to classify the presence of psychrotolerant spores in each raw milk sample. The summary statistics showed that spore levels vary considerably between certified organic farms but were only slightly higher than those from conventional farms in previous longitudinal studies. The variable importance plots from the models suggest that herd size, certification year, employee-related variables, clipping and flaming udders are important for the spore levels in organic raw milk. The small effects of these variables as shown in partial dependence plots suggest a need for individualized risk-based approach to manage spore levels. Incorporating novel data streams has the potential to enhance the performance of the model as a real-time monitoring tool.
Collapse
Affiliation(s)
- Chenhao Qian
- Department of Food Science, Cornell University, Ithaca, New York, United States
| | - Renee T Lee
- Department of Food Science, Cornell University, Ithaca, New York, United States
| | - Rachel L Weachock
- Department of Food Science, Cornell University, Ithaca, New York, United States
| | - Martin Wiedmann
- Department of Food Science, Cornell University, Ithaca, New York, United States
| | - Nicole H Martin
- Department of Food Science, Cornell University, Ithaca, New York, United States.
| |
Collapse
|
3
|
Hofstetter J, Holcomb DA, Kahler AM, Rodrigues C, da Silva ALBR, Mattioli MC. Performance of Conditional Random Forest and Regression Models at Predicting Human Fecal Contamination of Produce Irrigation Ponds in the Southeastern United States. ACS ES&T WATER 2024; 4:5844-5855. [PMID: 39734778 PMCID: PMC11672865 DOI: 10.1021/acsestwater.4c00839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2024]
Abstract
Irrigating fresh produce with contaminated water contributes to the burden of foodborne illness. Identifying fecal contamination of irrigation waters and characterizing fecal sources and associated environmental factors can help inform fresh produce safety and health hazard management. Using two previously collected data sets, we developed and evaluated the performance of logistic regression and conditional random forest models for predicting general and human-specific fecal contamination of ponds in southwest Georgia used for fresh produce irrigation. Generic Escherichia coli served as a general fecal indicator, and human-associated Bacteroides (HF183), crAssphage, and F+ coliphage genogroup II were used as indicators of human fecal contamination. Increased rainfall in the previous 7 days and the presence of a building within 152 m (a proxy for proximity to septic systems) were associated with increased odds of human fecal contamination in the training data set. However, the models did not accurately predict the presence of human-associated fecal indicators in a second data set collected from nearby irrigation ponds in different years. Predictive statistical models should be used with caution to assess produce irrigation water quality as models may not reliably predict fecal contamination at other locations and times, even within the same growing region.
Collapse
Affiliation(s)
- Jessica Hofstetter
- Waterborne Disease Prevention Branch, Centers for Disease Control and Prevention, Atlanta, Georgia 30333, United States; Chenega Enterprise Systems & Solutions, LLC, Chesapeake, Virginia 23320, United States; Department of Horticulture, Auburn University, Auburn, Alabama 36849, United States
| | - David A Holcomb
- Waterborne Disease Prevention Branch, Centers for Disease Control and Prevention, Atlanta, Georgia 30333, United States
| | - Amy M Kahler
- Waterborne Disease Prevention Branch, Centers for Disease Control and Prevention, Atlanta, Georgia 30333, United States
| | - Camila Rodrigues
- Department of Horticulture, Auburn University, Auburn, Alabama 36849, United States
| | | | - Mia C Mattioli
- Waterborne Disease Prevention Branch, Centers for Disease Control and Prevention, Atlanta, Georgia 30333, United States
| |
Collapse
|
4
|
Jiang P, Sun S, Goh SG, Tong X, Chen Y, Yu K, He Y, Gin KYH. A rapid approach with machine learning for quantifying the relative burden of antimicrobial resistance in natural aquatic environments. WATER RESEARCH 2024; 262:122079. [PMID: 39047454 DOI: 10.1016/j.watres.2024.122079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 06/05/2024] [Accepted: 07/09/2024] [Indexed: 07/27/2024]
Abstract
The massive use and discharge of antibiotics have led to increasing concerns about antimicrobial resistance (AMR) in natural aquatic environments. Since the dose-response mechanisms of pathogens with AMR have not yet been fully understood, and the antibiotic resistance genes and bacteria-related data collection via field sampling and laboratory testing is time-consuming and expensive, designing a rapid approach to quantify the burden of AMR in the natural aquatic environment has become a challenge. To cope with such a challenge, a new approach involving an integrated machine-learning framework was developed by investigating the associations between the relative burden of AMR and easily accessible variables (i.e., relevant environmental variables and adjacent land-use patterns). The results, based on a real-world case analysis, demonstrate that the quantification speed has been reduced from 3-7 days, which is typical for traditional measurement procedures with field sampling and laboratory testing, to approximately 0.5 hours using the new approach. Moreover, all five metrics for AMR relative burden quantification exceed the threshold level of 85%, with F1-score surpassing 0.92. Compared to logistic regression, decision trees, and basic random forest, the adaptive random forest model within the framework significantly improves quantification accuracy without sacrificing model interpretability. Two environmental variables, dissolved oxygen and resistivity, along with the proportion of green areas were identified as three key feature variables for the rapid quantification. This study contributes to the enrichment of burden analyses and management practices for rapid quantification of the relative burden of AMR without dose-response information.
Collapse
Affiliation(s)
- Peng Jiang
- Department of Industrial Engineering and Management, Business School, Sichuan University, Chengdu 610064, China; NUS Environmental Research Institute, National University of Singapore, Singapore 117411, Singapore.
| | - Shuyi Sun
- Department of Industrial Engineering and Management, Business School, Sichuan University, Chengdu 610064, China; Department of Industrial Systems Engineering & Management, National University of Singapore, Singapore 119260, Singapore
| | - Shin Giek Goh
- NUS Environmental Research Institute, National University of Singapore, Singapore 117411, Singapore
| | - Xuneng Tong
- NUS Environmental Research Institute, National University of Singapore, Singapore 117411, Singapore
| | - Yihan Chen
- School of Resources and Environmental Engineering, Hefei University of Technology, Hefei, 230009, China
| | - Kaifeng Yu
- School of Environmental Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yiliang He
- School of Environmental Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Karina Yew-Hoong Gin
- NUS Environmental Research Institute, National University of Singapore, Singapore 117411, Singapore; Department of Civil & Environmental Engineering, National University of Singapore, Singapore 117576, Singapore.
| |
Collapse
|
5
|
Hong SM, Morgan BJ, Stocker MD, Smith JE, Kim MS, Cho KH, Pachepsky YA. Using machine learning models to estimate Escherichia coli concentration in an irrigation pond from water quality and drone-based RGB imagery data. WATER RESEARCH 2024; 260:121861. [PMID: 38875854 DOI: 10.1016/j.watres.2024.121861] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 05/29/2024] [Accepted: 05/30/2024] [Indexed: 06/16/2024]
Abstract
The rapid and efficient quantification of Escherichia coli concentrations is crucial for monitoring water quality. Remote sensing techniques and machine learning algorithms have been used to detect E. coli in water and estimate its concentrations. The application of these approaches, however, is challenged by limited sample availability and unbalanced water quality datasets. In this study, we estimated the E. coli concentration in an irrigation pond in Maryland, USA, during the summer season using demosaiced natural color (red, green, and blue: RGB) imagery in the visible and infrared spectral ranges, and a set of 14 water quality parameters. We did this by deploying four machine learning models - Random Forest (RF), Gradient Boosting Machine (GBM), Extreme Gradient Boosting (XGB), and K-nearest Neighbor (KNN) - under three data utilization scenarios: water quality parameters only, combined water quality and small unmanned aircraft system (sUAS)-based RGB data, and RGB data only. To select the training and test datasets, we applied two data-splitting methods: ordinary and quantile data splitting. These methods provided a constant splitting ratio in each decile of the E. coli concentration distribution. Quantile data splitting resulted in better model performance metrics and smaller differences between the metrics for both the training and testing datasets. When trained with quantile data splitting after hyperparameter optimization, models RF, GBM, and XGB had R2 values above 0.847 for the training dataset and above 0.689 for the test dataset. The combination of water quality and RGB imagery data resulted in a higher R2 value (>0.896) for the test dataset. Shapley additive explanations (SHAP) of the relative importance of variables revealed that the visible blue spectrum intensity and water temperature were the most influential parameters in the RF model. Demosaiced RGB imagery served as a useful predictor of E. coli concentration in the studied irrigation pond.
Collapse
Affiliation(s)
- Seok Min Hong
- USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA; Department of Civil Urban Earth and Environmental Engineering, Ulsan National Institute of Science and Technology, UNIST-gil 50, Ulsan, 44919, South Korea
| | - Billie J Morgan
- USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA
| | - Matthew D Stocker
- USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA
| | - Jaclyn E Smith
- USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA
| | - Moon S Kim
- USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA
| | - Kyung Hwa Cho
- School of Civil, Environmental and Architectural Engineering, Korea University, Seoul, 02841, South Korea.
| | - Yakov A Pachepsky
- USDA-ARS Environmental Microbial and Food Safety Laboratory, 10300 Baltimore Ave, Bldg. 173, Beltsville, MD, 20705, USA.
| |
Collapse
|
6
|
Sun L, Li M, Liu B, Li R, Deng H, Zhu X, Zhu X, Tsang DCW. Machine learning for municipal sludge recycling by thermochemical conversion towards sustainability. BIORESOURCE TECHNOLOGY 2024; 394:130254. [PMID: 38151207 DOI: 10.1016/j.biortech.2023.130254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 12/09/2023] [Accepted: 12/23/2023] [Indexed: 12/29/2023]
Abstract
The sustainable disposal of high-moisture municipal sludge (MS) has received increasing attention. Thermochemical conversion technologies can be used to recycle MS into liquid/gas bio-fuel and value-added solid products. In this review, we compared energy recovery potential of common thermochemical technologies (i.e., incineration, pyrolysis, hydrothermal conversion) for MS disposal via statistical methods, which indicated that hydrothermal conversion had a great potential in achieving energy recovery from MS. The application of machine learning (ML) in MS recycling was discussed to decipher complex relationships among MS components, process parameters and physicochemical reactions. Comprehensive ML models should be developed considering successive reaction processes of thermochemical conversion in future studies. Furthermore, challenges and prospects were proposed to improve effectiveness of ML for energizing thermochemical conversion of MS regarding data collection and preprocessing, model optimization and interpretability. This review sheds light on mechanism exploration of MS thermochemical recycling by ML, and provide practical guidance for MS recycling.
Collapse
Affiliation(s)
- Lianpeng Sun
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China; Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, Sun Yat-sen University, Guangzhou 510275, China
| | - Mingxuan Li
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Bingyou Liu
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Ruohong Li
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China; Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, Sun Yat-sen University, Guangzhou 510275, China
| | - Huanzhong Deng
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China
| | - Xiefei Zhu
- School of Advanced Energy, Sun Yat-sen University, Shenzhen 518107, China
| | - Xinzhe Zhu
- School of Environmental Science and Engineering, Sun Yat-sen University, Guangzhou 510275, China; Guangdong Provincial Key Laboratory of Environmental Pollution Control and Remediation Technology, Sun Yat-sen University, Guangzhou 510275, China.
| | - Daniel C W Tsang
- Department of Civil and Environmental Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, China
| |
Collapse
|
7
|
Yoon M, Park JJ, Hur T, Hua CH, Hussain M, Lee S, Choi DJ. Application and Potential of Artificial Intelligence in Heart Failure: Past, Present, and Future. INTERNATIONAL JOURNAL OF HEART FAILURE 2024; 6:11-19. [PMID: 38303917 PMCID: PMC10827704 DOI: 10.36628/ijhf.2023.0050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 11/24/2023] [Accepted: 11/26/2023] [Indexed: 02/03/2024]
Abstract
The prevalence of heart failure (HF) is increasing, necessitating accurate diagnosis and tailored treatment. The accumulation of clinical information from patients with HF generates big data, which poses challenges for traditional analytical methods. To address this, big data approaches and artificial intelligence (AI) have been developed that can effectively predict future observations and outcomes, enabling precise diagnoses and personalized treatments of patients with HF. Machine learning (ML) is a subfield of AI that allows computers to analyze data, find patterns, and make predictions without explicit instructions. ML can be supervised, unsupervised, or semi-supervised. Deep learning is a branch of ML that uses artificial neural networks with multiple layers to find complex patterns. These AI technologies have shown significant potential in various aspects of HF research, including diagnosis, outcome prediction, classification of HF phenotypes, and optimization of treatment strategies. In addition, integrating multiple data sources, such as electrocardiography, electronic health records, and imaging data, can enhance the diagnostic accuracy of AI algorithms. Currently, wearable devices and remote monitoring aided by AI enable the earlier detection of HF and improved patient care. This review focuses on the rationale behind utilizing AI in HF and explores its various applications.
Collapse
Affiliation(s)
- Minjae Yoon
- Division of Cardiology, Department of Internal Medicine, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam, Korea
| | - Jin Joo Park
- Division of Cardiology, Department of Internal Medicine, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam, Korea
| | - Taeho Hur
- Division of Cardiology, Department of Internal Medicine, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam, Korea
- Department of Computer Science and Engineering, Kyung Hee University, Yongin, Korea
| | - Cam-Hao Hua
- Department of Computer Science and Engineering, Kyung Hee University, Yongin, Korea
| | - Musarrat Hussain
- Department of Computer Science and Engineering, Kyung Hee University, Yongin, Korea
| | - Sungyoung Lee
- Department of Computer Science and Engineering, Kyung Hee University, Yongin, Korea
| | - Dong-Ju Choi
- Division of Cardiology, Department of Internal Medicine, Seoul National University Bundang Hospital, Seoul National University College of Medicine, Seongnam, Korea
| |
Collapse
|
8
|
Reynaert E, Steiner P, Yu Q, D'Olif L, Joller N, Schneider MY, Morgenroth E. Predicting microbial water quality in on-site water reuse systems with online sensors. WATER RESEARCH 2023; 240:120075. [PMID: 37263119 DOI: 10.1016/j.watres.2023.120075] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 03/24/2023] [Accepted: 05/11/2023] [Indexed: 06/03/2023]
Abstract
Widespread implementation of on-site water reuse is hindered by the limited availability of monitoring approaches that ensure microbial quality during operation. In this study, we developed a methodology for monitoring microbial water quality in on-site water reuse systems using inexpensive and commercially available online sensors. An extensive dataset containing sensor and microbial water quality data for six of the most critical types of disruptions in membrane bioreactors with chlorination was collected. We then tested the ability of three typological machine learning algorithms - logistic regression, support-vector machine, and random forest - to predict the microbial water quality as "safe" or "unsafe" for reuse. The main criteria for model optimization was to ensure a low false positive rate (FPR) - the percentage of safe predictions when the actual condition is unsafe - which is essential to protect users health. This resulted in enforcing a fixed FPR ≤ 2%. Maximizing the true positive rate (TPR) - the percentage of safe predictions when the actual condition is safe - was given second priority. Our results show that logistic-regression-based models using only two out of the six sensors (free chlorine and oxidation-reduction potential) achieved the highest TPR. Including sensor slopes as engineered features allowed to reach similar TPRs using only one sensor instead of two. Analysis of the occurrence of false predictions showed that these were mostly early alarms, a characteristic that could be regarded as an asset in alarm management. In conclusion, the simplest algorithm in combination with only one or two sensors performed best at predicting the microbial water quality. This result provides useful insights for water quality modeling or for applications where small datasets are a common challenge and a general advantage might be gained by using simpler models that reduce the risk of overfitting, allow better interpretability, and require less computational power.
Collapse
Affiliation(s)
- Eva Reynaert
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland; ETH Zürich, Institute of Environmental Engineering, 8093 Zürich, Switzerland.
| | - Philipp Steiner
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland
| | - Qixing Yu
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland; Ecole Polytechnique Fédérale de Lausanne (EPFL), Section of Environmental Sciences and Engineering, 1015 Lausanne, Switzerland
| | - Lukas D'Olif
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland; ETH Zürich, Institute of Environmental Engineering, 8093 Zürich, Switzerland
| | - Noah Joller
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland; ETH Zürich, Institute of Environmental Engineering, 8093 Zürich, Switzerland
| | - Mariane Y Schneider
- The University of Tokyo, Next Generation Artificial Intelligence Research Center & School of Information Science and Technology, 113-8656 Tokyo, Japan.
| | - Eberhard Morgenroth
- Eawag, Swiss Federal Institute of Aquatic Science and Technology, 8600 Dübendorf, Switzerland; ETH Zürich, Institute of Environmental Engineering, 8093 Zürich, Switzerland
| |
Collapse
|
9
|
Averbuch T, Sullivan K, Sauer A, Mamas MA, Voors AA, Gale CP, Metra M, Ravindra N, Van Spall HGC. Applications of artificial intelligence and machine learning in heart failure. EUROPEAN HEART JOURNAL. DIGITAL HEALTH 2022; 3:311-322. [PMID: 36713018 PMCID: PMC9707916 DOI: 10.1093/ehjdh/ztac025] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 04/15/2022] [Indexed: 02/01/2023]
Abstract
Machine learning (ML) is a sub-field of artificial intelligence that uses computer algorithms to extract patterns from raw data, acquire knowledge without human input, and apply this knowledge for various tasks. Traditional statistical methods that classify or regress data have limited capacity to handle large datasets that have a low signal-to-noise ratio. In contrast to traditional models, ML relies on fewer assumptions, can handle larger and more complex datasets, and does not require predictors or interactions to be pre-specified, allowing for novel relationships to be detected. In this review, we discuss the rationale for the use and applications of ML in heart failure, including disease classification, early diagnosis, early detection of decompensation, risk stratification, optimal titration of medical therapy, effective patient selection for devices, and clinical trial recruitment. We discuss how ML can be used to expedite implementation and close healthcare gaps in learning healthcare systems. We review the limitations of ML, including opaque logic and unreliable model performance in the setting of data errors or data shift. Whilst ML has great potential to improve clinical care and research in HF, the applications must be externally validated in prospective studies for broad uptake to occur.
Collapse
Affiliation(s)
- Tauben Averbuch
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Kristen Sullivan
- Department of Medicine, McMaster University, Hamilton, Ontario, Canada
| | - Andrew Sauer
- Department of Cardiology, University of Kansas Health System, Kansas City, KS, USA
| | - Mamas A Mamas
- Keele Cardiovascular research group, Keele University, Stoke on Trent, Staffordshire
| | | | - Chris P Gale
- Department of Cardiology, University of Leeds, Leeds, West Yorkshire
| | - Marco Metra
- Azienda Socio Sanitaria Territoriale Spedali Civili and University of Brescia, Brescia, Italy
| | - Neal Ravindra
- Department of Computer Science, Yale University, New Haven, CT, USA
| | | |
Collapse
|
10
|
Buyrukoğlu S, Yılmaz Y, Topalcengiz Z. Correlation value determined to increase Salmonella prediction success of deep neural network for agricultural waters. ENVIRONMENTAL MONITORING AND ASSESSMENT 2022; 194:373. [PMID: 35435507 DOI: 10.1007/s10661-022-10050-7] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 04/09/2022] [Indexed: 06/14/2023]
Abstract
The use of computer-based tools has been becoming popular in the field of produce safety. Various algorithms have been applied to predict the population and presence of indicator microorganisms and pathogens in agricultural water sources. The purpose of this study is to improve the Salmonella prediction success of deep feed-forward neural network (DFNN) in agricultural surface waters with a determined correlation value based on selected features. Datasets were collected from six agricultural ponds in Central Florida. The most successful physicochemical and environmental features were selected by the gain ratio for the prediction of generic Escherichia coli population with machine learning algorithms (decision tree, random forest, support vector machine). Salmonella prediction success of DFNN was evaluated with dataset including selected environmental and physicochemical features combined with predicted E. coli populations with and without correlation value. The performance of correlation value was evaluated with all possible mathematical dataset combinations (nCr) of six ponds. The higher accuracy performances (%) were achieved through DFNN analyses with correlation value between 88.89 and 98.41 compared to values with no correlation value from 83.68 to 96.99 for all dataset combinations. The findings emphasize the success of determined correlation value for the prediction of Salmonella presence in agricultural surface waters.
Collapse
Affiliation(s)
- Selim Buyrukoğlu
- Department of Computer Engineering, Faculty of Engineering, Çankırı Karatekin University, 18100, Çankırı, Turkey.
| | - Yıldıran Yılmaz
- Computer Engineering Department, Faculty of Engineering and Architecture, Recep Tayyip Erdogan University, 53020, Rize, Turkey
| | - Zeynal Topalcengiz
- Department of Food Engineering, Faculty of Engineering and Architecture, Muş Alparslan University, 49250, Muş, Turkey
| |
Collapse
|
11
|
Li L, Qiao J, Yu G, Wang L, Li HY, Liao C, Zhu Z. Interpretable tree-based ensemble model for predicting beach water quality. WATER RESEARCH 2022; 211:118078. [PMID: 35066260 DOI: 10.1016/j.watres.2022.118078] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2021] [Revised: 11/29/2021] [Accepted: 01/12/2022] [Indexed: 06/14/2023]
Abstract
Tree-based machine learning models based on environmental features offer low-cost and timely solutions for predicting microbial fecal contamination in beach water to inform the public of the health risk. However, many of these models are black boxes that are difficult for humans to understand, which may cause severe consequences such as unexplained decisions and failure in accountability. To develop interpretable predictive models for beach water quality, we evaluate five tree-based models, namely classification tree, random forest, CatBoost, XGBoost, and LightGBM, and employ a state-of-the-art explanation method SHAP to explain the models. When tested on the Escherichia coli (E. coli) concentration data collected from three beach sites along Lake Erie shores, LightGBM, followed by XGBoost, achieves the highest averaged precision and recall scores. For all three sites, both models suggest lake turbidity as the most important predictor, and elucidate the crucial role of accurate local data of wave height and rainfall in the model development. Local SHAP values further reveal the robustness of the importance of lake turbidity as its SHAP value increases nearly monotonically with its value and is minimally affected by other environmental factors. Moreover, we found an intriguing interaction between lake turbidity and day-of-year. This work suggests that the combination of LightGBM and SHAP has a promising potential to develop interpretable models for predicting microbial water quality in freshwater lakes.
Collapse
Affiliation(s)
- Lingbo Li
- Department of Civil, Structural and Environmental Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Jundong Qiao
- Department of Civil, Structural and Environmental Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Guan Yu
- Department of Biostatistics, University at Buffalo, The State University of New York, Buffalo, NY, USA
| | - Leizhi Wang
- Nanjing Hydraulic Research Institute, State Key laboratory of Hydrology, Water Resources and Hydraulic Engineering & Science, Nanjing 210029, China
| | - Hong-Yi Li
- Department of Civil and Environmental Engineering, University of Houston, Houston, TX, USA
| | - Chen Liao
- Program for Computational and Systems Biology, Memorial Sloan-Kettering Cancer Center, NY, USA.
| | - Zhenduo Zhu
- Department of Civil, Structural and Environmental Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA.
| |
Collapse
|
12
|
Precision Irrigation Management Using Machine Learning and Digital Farming Solutions. AGRIENGINEERING 2022. [DOI: 10.3390/agriengineering4010006] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
Freshwater is essential for irrigation and the supply of nutrients for plant growth, in order to compensate for the inadequacies of rainfall. Agricultural activities utilize around 70% of the available freshwater. This underscores the importance of responsible management, using smart agricultural water technologies. The focus of this paper is to investigate research regarding the integration of different machine learning models that can provide optimal irrigation decision management. This article reviews the research trend and applicability of machine learning techniques, as well as the deployment of developed machine learning models for use by farmers toward sustainable irrigation management. It further discusses how digital farming solutions, such as mobile and web frameworks, can enable the management of smart irrigation processes, with the aim of reducing the stress faced by farmers and researchers due to the opportunity for remote monitoring and control. The challenges, as well as the future direction of research, are also discussed.
Collapse
|
13
|
Stocker MD, Pachepsky YA, Hill RL. Prediction of E. coli Concentrations in Agricultural Pond Waters: Application and Comparison of Machine Learning Algorithms. Front Artif Intell 2022; 4:768650. [PMID: 35088045 PMCID: PMC8787305 DOI: 10.3389/frai.2021.768650] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2021] [Accepted: 12/13/2021] [Indexed: 11/13/2022] Open
Abstract
The microbial quality of irrigation water is an important issue as the use of contaminated waters has been linked to several foodborne outbreaks. To expedite microbial water quality determinations, many researchers estimate concentrations of the microbial contamination indicator Escherichia coli (E. coli) from the concentrations of physiochemical water quality parameters. However, these relationships are often non-linear and exhibit changes above or below certain threshold values. Machine learning (ML) algorithms have been shown to make accurate predictions in datasets with complex relationships. The purpose of this work was to evaluate several ML models for the prediction of E. coli in agricultural pond waters. Two ponds in Maryland were monitored from 2016 to 2018 during the irrigation season. E. coli concentrations along with 12 other water quality parameters were measured in water samples. The resulting datasets were used to predict E. coli using stochastic gradient boosting (SGB) machines, random forest (RF), support vector machines (SVM), and k-nearest neighbor (kNN) algorithms. The RF model provided the lowest RMSE value for predicted E. coli concentrations in both ponds in individual years and over consecutive years in almost all cases. For individual years, the RMSE of the predicted E. coli concentrations (log10 CFU 100 ml-1) ranged from 0.244 to 0.346 and 0.304 to 0.418 for Pond 1 and 2, respectively. For the 3-year datasets, these values were 0.334 and 0.381 for Pond 1 and 2, respectively. In most cases there was no significant difference (P > 0.05) between the RMSE of RF and other ML models when these RMSE were treated as statistics derived from 10-fold cross-validation performed with five repeats. Important E. coli predictors were turbidity, dissolved organic matter content, specific conductance, chlorophyll concentration, and temperature. Model predictive performance did not significantly differ when 5 predictors were used vs. 8 or 12, indicating that more tedious and costly measurements provide no substantial improvement in the predictive accuracy of the evaluated algorithms.
Collapse
Affiliation(s)
- Matthew D. Stocker
- Environmental Microbial and Food Safety Laboratory, United States Department of Agriculture–Agricultural Research Service, Beltsville, MD, United States
- Oak Ridge Institute for Science and Education, Oak Ridge, TN, United States
- Department of Environmental Science and Technology, University of Maryland, College Park, MD, United States
| | - Yakov A. Pachepsky
- Environmental Microbial and Food Safety Laboratory, United States Department of Agriculture–Agricultural Research Service, Beltsville, MD, United States
| | - Robert L. Hill
- Department of Environmental Science and Technology, University of Maryland, College Park, MD, United States
| |
Collapse
|