1
|
Ryu YH, Min SK. Leveraging physics-based and explainable machine learning approaches to quantify the relative contributions of rain and air pollutants to wet deposition. Sci Total Environ 2024; 931:172980. [PMID: 38705308 DOI: 10.1016/j.scitotenv.2024.172980] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 05/02/2024] [Accepted: 05/02/2024] [Indexed: 05/07/2024]
Abstract
A quantitative understanding of the roles of rainfall and pollutant concentrations in wet deposition is important because they critically influence terrestrial and aquatic ecosystems. However, their relative contributions to wet deposition, which vary across regions, have not yet been identified. We propose two methods that quantitatively separate the contributions of rain and pollutant concentrations to wet deposition: one is based on simplified equations describing the wet scavenging of pollutants and the other is based on random forest models employing SHapley Additive exPlanations. Three-dimensional long-term air quality simulations from 2003 to 2019 are used as inputs for both the physics-based and machine learning models. Remarkably, the results drawn from the explainable machine learning model are consistent with those from the physics-based approach: overall, rain is a more important limiting factor than pollutant concentrations and the relative contribution of rain is larger than that of pollutants by up to a factor of 3-4 in polluted regions. In polluted regions, pollutant concentrations can remain relatively high even in the presence of precipitation owing to continuous and intense emissions; therefore, wet deposition is limited by rainfall. The contribution of rainfall is larger by 1.5-2.5 than that of pollutant concentrations in regions even with low emissions and this considerably large role of rain suggests that regional or transboundary pollutant transport plays a key role in modulating wet deposition. However, in very remote regions, once the rainfall amount exceeds a certain value, rainfall no longer contributes to increasing wet deposition because atmospheric pollutants are readily removed by rain. So, the contributions of the two factors are comparable in pristine regions. Our results can serve as a basis for explaining interannual variations in wet deposition and for future projections of wet deposition under emission control plans and climate change scenarios across regions.
Collapse
Affiliation(s)
- Young-Hee Ryu
- Department of Atmospheric Sciences, Yonsei University, Seoul 03722, Republic of Korea.
| | - Seung-Ki Min
- Division of Environmental Science and Engineering, Pohang University of Science and Technology (POSTECH), Pohang 37673, Republic of Korea; Institute for Convergence Research and Education in Advanced Technology, Yonsei University, Incheon, Republic of Korea
| |
Collapse
|
2
|
Wu C, Liang Y, Jiang S, Shi Z. Mechanistic and data-driven perspectives on plant uptake of organic pollutants. Sci Total Environ 2024; 929:172415. [PMID: 38631647 DOI: 10.1016/j.scitotenv.2024.172415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 04/09/2024] [Accepted: 04/10/2024] [Indexed: 04/19/2024]
Abstract
Establishing reliable predictive models for plant uptake of organic pollutants is crucial for environmental risk assessment and guiding phytoremediation efforts. This study compiled an expanded dataset of plant cuticle-water partition coefficients (Kcw), a useful indicator for plant uptake, for 371 data points of 148 unique compounds and various plant species. Quantum/computational chemistry software and tools were utilized to compute various molecular descriptors, aiming to comprehensively characterize the properties and structures of each compound. Three types of models were developed to predict Kcw: a mechanism-driven pp-LFER model, a data-driven machine learning model, and an integrated mechanism-data-driven model. The mechanism-data-driven GBRT-ppLFER model exhibited superior performance, achieving RMSEtrain = 0.133 and RMSEtest = 0.301 while maintaining interpretability. The Shapley Additive Explanation analysis indicated that pp-LFER parameters, ESPI, FwRadicalmax, ExtFP607, and RDF70s are the key factors influencing plant uptake in the GBRT-ppLFER model. Overall, pp-LFER parameter, ESPI, and ExtFP607 show positive effects, while the remaining factors exhibit negative effects. Partial dependency analysis further indicated that plant uptake is not solely determined by individual factors but rather by the combined interactions of multiple factors. Specifically, compounds with ppLFER parameter >4, ESPI > -25.5, 0.098 < FwRadicalmax <0.132, and 2 < RFD70s < 3, are generally more readily taken up by plants. Besides, the predicted Kcw values from the GBRT-ppLFER model were effectively employed to estimate the plant-water partition coefficients and bioconcentration factors across different plant species and growth media (water, sand, and soil), achieving an outstanding performance with an RMSE of 0.497. This study provides effective tools for assessing plant uptake of organic pollutants and deepens our understanding of plant-environment-compound interactions.
Collapse
Affiliation(s)
- Chunya Wu
- School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China
| | - Yuzhen Liang
- School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China.
| | - Shan Jiang
- School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China
| | - Zhenqing Shi
- School of Environment and Energy, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China; The Key Lab of Pollution Control and Ecosystem Restoration in Industry Clusters, Ministry of Education, South China University of Technology, Guangzhou, Guangdong 510006, People's Republic of China
| |
Collapse
|
3
|
Bacanin N, Perisic M, Jovanovic G, Damaševičius R, Stanisic S, Simic V, Zivkovic M, Stojic A. The explainable potential of coupling hybridized metaheuristics, XGBoost, and SHAP in revealing toluene behavior in the atmosphere. Sci Total Environ 2024; 929:172195. [PMID: 38631643 DOI: 10.1016/j.scitotenv.2024.172195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/02/2023] [Revised: 04/01/2024] [Accepted: 04/01/2024] [Indexed: 04/19/2024]
Abstract
Toluene is a neurotoxic aromatic hydrocarbon and one of the major representatives of volatile organic compounds, known for its abundance, adverse health effects, and role in the formation of other atmospheric pollutants like ozone. This research introduces the enhanced version of the reptile search metaheuristics algorithm which has been utilized to tune the extreme gradient boosting hyperparameters, to investigate toluene atmospheric behavior patterns and interactions with other polluting species within defined environmental conditions. The study is based on a two-year database encompassing concentrations of inorganic gaseous contaminants every hour (NO, NO2, NOx, and O3), particulate matter fractions (PM1, PM2.5, and PM10), m,p-xylene, toluene, benzene, total non-methane hydrocarbons, and meteorological data. The experimental outcomes were validated against the results of extreme gradient boosting models optimized by seven other recent powerful metaheuristics algorithms. The best-performing model has been interpreted by employing Shapley additive explanations method. In the study, we have focused on the relationship between toluene and benzene, as its most important predictor, and provided a detailed description of environmental conditions which directed their interactions.
Collapse
Affiliation(s)
- Nebojsa Bacanin
- Informatics and Computing, Singidunum University, Danijelova 32, Belgrade 11010, Serbia; Sinergija University, Raje Banjicica, Bjeljina 76300, Bosnia and Herzegovina.
| | - Mirjana Perisic
- Informatics and Computing, Singidunum University, Danijelova 32, Belgrade 11010, Serbia; Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, Belgrade 11010, Serbia.
| | - Gordana Jovanovic
- Informatics and Computing, Singidunum University, Danijelova 32, Belgrade 11010, Serbia; Institute of Physics Belgrade, University of Belgrade, Pregrevica 118, Belgrade 11010, Serbia.
| | - Robertas Damaševičius
- Centre of Real Time Computer Systems, Kaunas University of Technology, Barsausko 59, Kaunas 51423, Lithuania.
| | - Svetlana Stanisic
- Informatics and Computing, Singidunum University, Danijelova 32, Belgrade 11010, Serbia.
| | - Vladimir Simic
- Faculty of Transport and Traffic Engineering, University of Belgrade, Vojvode Stepe 305, Belgrade 44249, Serbia; Yuan Ze University, College of Engineering, Department of Industrial Engineering and Management, Taoyuan City 320315, Taiwan; Department of Computer Science and Engineering, College of Informatics, Korea University, Seoul 02841, Republic of Korea.
| | - Miodrag Zivkovic
- Informatics and Computing, Singidunum University, Danijelova 32, Belgrade 11010, Serbia.
| | - Andreja Stojic
- Informatics and Computing, Singidunum University, Danijelova 32, Belgrade 11010, Serbia; Sinergija University, Raje Banjicica, Bjeljina 76300, Bosnia and Herzegovina.
| |
Collapse
|
4
|
Herceg Romanić S, Mendaš G, Fingler S, Drevenkar V, Mustać B, Jovanović G. Polychlorinated biphenyls in mussels, small pelagic fish, tuna, turtles, and dolphins from the Croatian Adriatic Sea waters: an overview of the last two decades of monitoring. Arh Hig Rada Toksikol 2024; 75:15-23. [PMID: 38548374 PMCID: PMC10978161 DOI: 10.2478/aiht-2024-75-3814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 01/01/2024] [Accepted: 03/01/2024] [Indexed: 04/01/2024] Open
Abstract
This review summarises our two decades of polychlorinated biphenyl (PCB) monitoring in different marine organisms along the eastern Adriatic Sea. The aim was to gain an insight into the trends of PCB distribution in order to evaluate the effectiveness of past and current legislation and suggest further action. Here we mainly focus on PCB levels in wild and farmed Mediterranean mussels, wild and farmed bluefin tuna, loggerhead sea turtles, common bottlenose dolphins, and small pelagic fish. The use of artificial intelligence and advanced statistics enabled an insight into the influence of various variables on the uptake of PCBs in the investigated organisms as well as into their mutual dependence. Our findings suggest that PCBs in small pelagic fish and mussels reflect global pollution and that high levels in dolphins and wild tuna tissues raise particular concern, as they confirm their biomagnification up the food chain. Therefore, the ongoing PCB monitoring should focus on predatory species in particular to help us better understand PCB contamination in marine ecosystems in our efforts to protect the environment and human health.
Collapse
Affiliation(s)
| | - Gordana Mendaš
- Institute for Medical Research and Occupational Health, Zagreb, Croatia
| | - Sanja Fingler
- Institute for Medical Research and Occupational Health, Zagreb, Croatia
| | - Vlasta Drevenkar
- Institute for Medical Research and Occupational Health, Zagreb, Croatia
| | - Bosiljka Mustać
- University of Zadar, Department of Ecology, Agronomy and Aquaculture, Zadar, Croatia
| | - Gordana Jovanović
- University of Belgrade Institute of Physics, Belgrade, Serbia
- Singidunum University, Belgrade, Serbia
| |
Collapse
|
5
|
Liu H, Sun BQ, Tang ZW, Qian SC, Zheng SQ, Wang QY, Shao YF, Chen JQ, Yang JN, Ding Y, Zhang HJ. Anti-inflammatory response-based risk assessment in acute type A aortic dissection: A national multicenter cohort study. Int J Cardiol Heart Vasc 2024; 50:101341. [PMID: 38313452 PMCID: PMC10835346 DOI: 10.1016/j.ijcha.2024.101341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Revised: 01/03/2024] [Accepted: 01/10/2024] [Indexed: 02/06/2024]
Abstract
Background Early identification of patients at high risk of operative mortality is important for acute type A aortic dissection (TAAD). We aimed to investigate whether patients with distinct risk stratifications respond differently to anti-inflammatory pharmacotherapy. Methods From 13 cardiovascular hospitals, 3110 surgically repaired TAAD patients were randomly divided into a training set (70%) and a test set (30%) to develop and validate a risk model to predict operative mortality using extreme gradient boosting. Performance was measured by the area under the receiver operating characteristic curve (AUC). Subgroup analyses were performed by risk stratifications (low versus middle-high risk) and anti-inflammatory pharmacotherapy (absence versus presence of ulinastatin use). Results A simplified risk model was developed for predicting operative mortality, consisting of the top ten features of importance: platelet-leukocyte ratio, D-dimer, activated partial thromboplastin time, urea nitrogen, glucose, lactate, base excess, hemoglobin, albumin, and creatine kinase-MB, which displayed a superior discrimination ability (AUC: 0.943, 95 % CI 0.928-0.958 and 0.884, 95 % CI 0.836-0.932) in the derivation and validation cohorts, respectively. Ulinastatin use was not associated with decreased risk of operative mortality among each risk stratification, however, ulinastatin use was associated with a shorter mechanical ventilation duration among patients with middle-high risk (defined as risk probability >5.0 %) (β -1.6 h, 95 % CI [-3.1, -0.1] hours; P = 0.048). Conclusion This risk model reflecting inflammatory, coagulation, and metabolic pathways achieved acceptable predictive performances of operative mortality following TAAD surgery, which will contribute to individualized anti-inflammatory pharmacotherapy.
Collapse
Affiliation(s)
- Hong Liu
- Department of Cardiovascular Surgery, the First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, PR China
| | - Bing-Qi Sun
- Department of Cardiovascular Surgery, Teda International Cardiovascular Hospital, Tianjin 300457 PR China
| | - Zhi-Wei Tang
- Department of Cardiovascular Surgery, the First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, PR China
| | - Si-Chong Qian
- Department of Cardiovascular Surgery, Beijing Anzhen Hospital, Capital Medical University, Beijing 100029, PR China
| | - Si-Qiang Zheng
- Department of Thoracic Surgery, Shanghai Pulmonary Hospital, Tongji University School of Medicine, Shanghai 200433, PR China
| | - Qing-Yuan Wang
- Department of Cardiovascular Surgery, the First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, PR China
| | - Yong-Feng Shao
- Department of Cardiovascular Surgery, the First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, PR China
| | - Jun-Quan Chen
- Department of Cardiovascular Surgery, Tianjin Chest Hospital, Tianjin Medical University, Tianjin 300222, PR China
| | - Ji-Nong Yang
- Department of Cardiovascular Surgery, Affiliated Hospital of Qingdao University, Qingdao 266003, PR China
| | - Yi Ding
- Department of Cardiovascular Surgery, the First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, PR China
| | - Hong-Jia Zhang
- Department of Cardiovascular Surgery, Beijing Anzhen Hospital, Capital Medical University, Beijing 100029, PR China
| |
Collapse
|
6
|
Ma Z, Wang R, Song G, Zhang K, Zhao Z, Wang J. Interpretable ensemble prediction for anaerobic digestion performance of hydrothermal carbonization wastewater. Sci Total Environ 2024; 908:168279. [PMID: 37926246 DOI: 10.1016/j.scitotenv.2023.168279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2023] [Revised: 10/12/2023] [Accepted: 10/31/2023] [Indexed: 11/07/2023]
Abstract
Hydrothermal carbonization (HTC) is a method to improve fuel quality that can directly treat wet solid waste, but the treatment produces large amounts of wastewater. Hydrothermal carbonation wastewater treatment for methane production by anaerobic digestion can lead to waste utilization and energy saving. However, anaerobic digestion performance prediction of HTC wastewater is challenging due to the complexity of influencing factors. This study applies interpretable machine learning combined with ensemble learning to construct ensemble prediction models for the biogas yield and CH4 concentration. The machine learning ensemble model can integrate the advantages of single models and effectively improve the prediction accuracy of the anaerobic digestion performance of HTC wastewater, with the best R2 reaching 0.836 and 0.820, respectively, which is better than 0.780 and 0.802 of the best single models. The SHapley Additive exPlanations theory is combined with the ensemble models to show that anaerobic digestion reacted time with HTC temperature, pH, and COD has a coupling effect on daily biogas yield and CH4 concentration.
Collapse
Affiliation(s)
- Zherui Ma
- Hebei Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China; Baoding Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China
| | - Ruikun Wang
- Hebei Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China; Baoding Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China.
| | - Gaoke Song
- Hebei Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China; Baoding Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China
| | - Kai Zhang
- Hebei Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China; Baoding Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China
| | - Zhenghui Zhao
- Hebei Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China; Baoding Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China
| | - Jiangjiang Wang
- Hebei Key Laboratory of Low Carbon and High Efficiency Power Generation Technology, North China Electric Power University, Baoding 071003, Hebei, China
| |
Collapse
|
7
|
Herceg Romanić S, Milićević T, Jovanović G, Matek Sarić M, Mendaš G, Fingler S, Jakšić G, Popović A, Relić D. Persistent organic pollutants in Croatian breast milk: An overview of pollutant levels and infant health risk assessment from 1976 to the present. Food Chem Toxicol 2023; 179:113990. [PMID: 37597765 DOI: 10.1016/j.fct.2023.113990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Revised: 07/03/2023] [Accepted: 08/14/2023] [Indexed: 08/21/2023]
Abstract
This review article summarizes our research of persistent organic pollutants (POPs) in human milk from Croatian mothers over the last few decades. Our studies make up the bulk of all POPs research in human milk in Croatia and show a state-of-the art in the research area. The first investigations were made in 1970's. Aim of our review article is to document the comprehensive results over several decades as the best tool to: 1.) contribute to understanding of POPs and their potential health risks, 2.) evaluate effectiveness of legislative bans and restrictions on human exposure to POPs in Croatia, and 3.) to suggest further actions. In our review we discuss: 1.) Human milk between 2011 and 2014 - evaluation of interrelations of organochlorine pesticides (OCP) and polychlorinated biphenyls (PCB) in human milk and their association with the mother's age and parity using artificial intelligence methods; and our yet unpublished research data on health risks for infants assessed through daily PCB and OCP intake. 2.) Time trends of PCB and OCP in human milk between 1976 and 2014. 3.) polychlorinated dibenzo-p-dioxins and polychlorinated dibenzofuran (PCDD/F) in human milk in 2000., and yet unpublished data on PCDD/F and polybrominated diphenyl ethers (PBDE) in 2014.
Collapse
Affiliation(s)
- Snježana Herceg Romanić
- Institute for Medical Research and Occupational Health, Ksaverska Cesta 2, 10001, Zagreb, Croatia
| | - Tijana Milićević
- Environmental Physics Laboratory, Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, Pregrevica 118, 11080, Belgrade, Serbia
| | - Gordana Jovanović
- Environmental Physics Laboratory, Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, Pregrevica 118, 11080, Belgrade, Serbia; Singidunum University, Danijelova 32, 11000, Belgrade, Serbia
| | - Marijana Matek Sarić
- Department of Health Studies, University of Zadar, Splitska 1, 23000, Zadar, Croatia
| | - Gordana Mendaš
- Institute for Medical Research and Occupational Health, Ksaverska Cesta 2, 10001, Zagreb, Croatia.
| | - Sanja Fingler
- Institute for Medical Research and Occupational Health, Ksaverska Cesta 2, 10001, Zagreb, Croatia
| | - Goran Jakšić
- Aquatika-Freshwater Aquarium Karlovac, Ulica Branka Čavlovića Čavleka 1/A, 47000, Karlovac, Croatia
| | - Aleksandar Popović
- University of Belgrade - Faculty of Chemistry, Studentski Trg 12-16, 11000, Belgrade, Serbia
| | - Dubravka Relić
- University of Belgrade - Faculty of Chemistry, Studentski Trg 12-16, 11000, Belgrade, Serbia
| |
Collapse
|
8
|
Lee SW, Lee EH, Choi IC. An ensemble machine learning approach to predict postoperative mortality in older patients undergoing emergency surgery. BMC Geriatr 2023; 23:262. [PMID: 37131138 PMCID: PMC10155414 DOI: 10.1186/s12877-023-03969-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 04/13/2023] [Indexed: 05/04/2023] Open
Abstract
BACKGROUND Prediction of preoperative frailty risk in the emergency setting is a challenging issue because preoperative evaluation cannot be done sufficiently. In a previous study, the preoperative frailty risk prediction model used only diagnostic and operation codes for emergency surgery and found poor predictive performance. This study developed a preoperative frailty prediction model using machine learning techniques that can be used in various clinical settings with improved predictive performance. METHODS This is a national cohort study including 22,448 patients who were older than 75 years and visited the hospital for emergency surgery from the cohort of older patients among the retrieved sample from the Korean National Health Insurance Service. The diagnostic and operation codes were one-hot encoded and entered into the predictive model using the extreme gradient boosting (XGBoost) as a machine learning technique. The predictive performance of the model for postoperative 90-day mortality was compared with those of previous frailty evaluation tools such as Operation Frailty Risk Score (OFRS) and Hospital Frailty Risk Score (HFRS) using the receiver operating characteristic curve analysis. RESULTS The predictive performance of the XGBoost, OFRS, and HFRS for postoperative 90-day mortality was 0.840, 0.607, and 0.588 on a c-statistics basis, respectively. CONCLUSIONS Using machine learning techniques, XGBoost to predict postoperative 90-day mortality, using diagnostic and operation codes, the prediction performance was improved significantly over the previous risk assessment models such as OFRS and HFRS.
Collapse
Affiliation(s)
- Sang-Wook Lee
- Department of Anesthesiology and Pain Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Seoul, 05505, Songpa-gu, Republic of Korea
| | - Eun-Ho Lee
- University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Seoul, 05505, Songpa-gu, Republic of Korea
| | - In-Cheol Choi
- Department of Anesthesiology and Pain Medicine, Asan Medical Center, University of Ulsan College of Medicine, 88 Olympic-ro 43-gil, Seoul, 05505, Songpa-gu, Republic of Korea.
| |
Collapse
|
9
|
Jovanovic G, Perisic M, Bacanin N, Zivkovic M, Stanisic S, Strumberger I, Alimpic F, Stojic A. Potential of Coupling Metaheuristics-Optimized-XGBoost and SHAP in Revealing PAHs Environmental Fate. Toxics 2023; 11:394. [PMID: 37112620 PMCID: PMC10142005 DOI: 10.3390/toxics11040394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2023] [Revised: 04/17/2023] [Accepted: 04/19/2023] [Indexed: 06/19/2023]
Abstract
Polycyclic aromatic hydrocarbons (PAHs) refer to a group of several hundred compounds, among which 16 are identified as priority pollutants, due to their adverse health effects, frequency of occurrence, and potential for human exposure. This study is focused on benzo(a)pyrene, being considered an indicator of exposure to a PAH carcinogenic mixture. For this purpose, we have applied the XGBoost model to a two-year database of pollutant concentrations and meteorological parameters, with the aim to identify the factors which were mostly associated with the observed benzo(a)pyrene concentrations and to describe types of environments that supported the interactions between benzo(a)pyrene and other polluting species. The pollutant data were collected at the energy industry center in Serbia, in the vicinity of coal mining areas and power stations, where the observed benzo(a)pyrene maximum concentration for a study period reached 43.7 ngm-3. The metaheuristics algorithm has been used to optimize the XGBoost hyperparameters, and the results have been compared to the results of XGBoost models tuned by eight other cutting-edge metaheuristics algorithms. The best-produced model was later on interpreted by applying Shapley Additive exPlanations (SHAP). As indicated by mean absolute SHAP values, the temperature at the surface, arsenic, PM10, and total nitrogen oxide (NOx) concentrations appear to be the major factors affecting benzo(a)pyrene concentrations and its environmental fate.
Collapse
Affiliation(s)
- Gordana Jovanovic
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 11000 Belgrade, Serbia; (M.P.); (F.A.); (A.S.)
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| | - Mirjana Perisic
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 11000 Belgrade, Serbia; (M.P.); (F.A.); (A.S.)
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| | - Nebojsa Bacanin
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| | - Miodrag Zivkovic
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| | - Svetlana Stanisic
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| | - Ivana Strumberger
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| | - Filip Alimpic
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 11000 Belgrade, Serbia; (M.P.); (F.A.); (A.S.)
| | - Andreja Stojic
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 11000 Belgrade, Serbia; (M.P.); (F.A.); (A.S.)
- Faculty of Informatics and Computing, Singidunum University, 11000 Belgrade, Serbia; (N.B.); (M.Z.); (I.S.)
| |
Collapse
|
10
|
Narita K, Matsui Y, Matsushita T, Shirasaki N. Screening priority pesticides for drinking water quality regulation and monitoring by machine learning: Analysis of factors affecting detectability. J Environ Manage 2023; 326:116738. [PMID: 36375426 DOI: 10.1016/j.jenvman.2022.116738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 11/01/2022] [Accepted: 11/06/2022] [Indexed: 06/16/2023]
Abstract
Proper selection of new contaminants to be regulated or monitored prior to implementation is an important issue for regulators and water supply utilities. Herein, we constructed and evaluated machine learning models for predicting the detectability (detection/non-detection) of pesticides in surface water as drinking water sources. Classification and regression models were constructed for Random Forest, XGBoost, and LightGBM, respectively; of these, the LightGBM classification model had the highest prediction accuracy. Furthermore, its prediction performance was superior in all aspects of Recall, Precision, and F-measure compared to the detectability index method, which is based on runoff models from previous studies. Regardless of the type of machine learning model, the number of annual measurements, sales quantity of pesticide for rice-paddy field, and water quality guideline values were the most important model features (explanatory variables). Analysis of the impact of the features suggested the presence of a threshold (or range), above which the detectability increased. In addition, if a feature (e.g., quantity of pesticide sales) acted to increase the likelihood of detection beyond a threshold value, other features also synergistically affected detectability. Proportion of false positives and negatives varied depending on the features used. The superiority of the machine learning models is their ability to represent nonlinear and complex relationships between features and pesticide detectability that cannot be represented by existing risk scoring methods.
Collapse
Affiliation(s)
- Kentaro Narita
- Graduate School of Engineering, Hokkaido University, N13W8, Sapporo, 060-8628, Japan
| | - Yoshihiko Matsui
- Faculty of Engineering, Hokkaido University, N13W8, Sapporo, 060-8628, Japan.
| | - Taku Matsushita
- Faculty of Engineering, Hokkaido University, N13W8, Sapporo, 060-8628, Japan
| | - Nobutaka Shirasaki
- Faculty of Engineering, Hokkaido University, N13W8, Sapporo, 060-8628, Japan
| |
Collapse
|
11
|
Shi Y, Zou Y, Liu J, Wang Y, Chen Y, Sun F, Yang Z, Cui G, Zhu X, Cui X, Liu F. Ultrasound-based radiomics XGBoost model to assess the risk of central cervical lymph node metastasis in patients with papillary thyroid carcinoma: Individual application of SHAP. Front Oncol 2022; 12:897596. [PMID: 36091102 PMCID: PMC9458917 DOI: 10.3389/fonc.2022.897596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 08/08/2022] [Indexed: 11/13/2022] Open
Abstract
ObjectivesA radiomics-based explainable eXtreme Gradient Boosting (XGBoost) model was developed to predict central cervical lymph node metastasis (CCLNM) in patients with papillary thyroid carcinoma (PTC), including positive and negative effects.MethodsA total of 587 PTC patients admitted at Binzhou Medical University Hospital from 2017 to 2021 were analyzed retrospectively. The patients were randomized into the training and test cohorts with an 8:2 ratio. Radiomics features were extracted from ultrasound images of the primary PTC lesions. The minimum redundancy maximum relevance algorithm and the least absolute shrinkage and selection operator regression were used to select CCLNM positively-related features and radiomics scores were constructed. Clinical features, ultrasound features, and radiomics score were screened out by the Boruta algorithm, and the XGBoost model was constructed from these characteristics. SHapley Additive exPlanations (SHAP) was used for individualized and visualized interpretation. SHAP addressed the cognitive opacity of machine learning models.ResultsEleven radiomics features were used to calculate the radiomics score. Five critical elements were used to build the XGBoost model: capsular invasion, radiomics score, diameter, age, and calcification. The area under the curve was 91.53% and 90.88% in the training and test cohorts, respectively. SHAP plots showed the influence of each parameter on the XGBoost model, including positive (i.e., capsular invasion, radiomics score, diameter, and calcification) and negative (i.e., age) impacts. The XGBoost model outperformed the radiologist, increasing the AUC by 44%.ConclusionsThe radiomics-based XGBoost model predicted CCLNM in PTC patients. Visual interpretation using SHAP made the model an effective tool for preoperative guidance of clinical procedures, including positive and negative impacts.
Collapse
Affiliation(s)
- Yan Shi
- Binzhou Medical University Hospital, Binzhou, China
| | - Ying Zou
- First Teaching Hospital of Tianjin University of Traditional Chinese Medicine, Tianjin, China
- National Clinical Research Center for Chinese Medicine Acupuncture and Moxibustion, Tianjin, China
| | - Jihua Liu
- First Teaching Hospital of Tianjin University of Traditional Chinese Medicine, Tianjin, China
- National Clinical Research Center for Chinese Medicine Acupuncture and Moxibustion, Tianjin, China
| | | | | | - Fang Sun
- Binzhou Medical University Hospital, Binzhou, China
| | - Zhi Yang
- Binzhou Medical University Hospital, Binzhou, China
| | - Guanghe Cui
- Binzhou Medical University Hospital, Binzhou, China
| | - Xijun Zhu
- Binzhou Medical University Hospital, Binzhou, China
| | - Xu Cui
- Binzhou Medical University Hospital, Binzhou, China
| | - Feifei Liu
- Binzhou Medical University Hospital, Binzhou, China
- Peking University People’s Hospital, Beijing, China
- *Correspondence: Feifei Liu,
| |
Collapse
|
12
|
Ma P, Liu R, Gu W, Dai Q, Gan Y, Cen J, Shang S, Liu F, Chen Y. Construction and Interpretation of Prediction Model of Teicoplanin Trough Concentration via Machine Learning. Front Med (Lausanne) 2022; 9:808969. [PMID: 35360734 PMCID: PMC8963816 DOI: 10.3389/fmed.2022.808969] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 01/25/2022] [Indexed: 02/02/2023] Open
Abstract
Objective To establish an optimal model to predict the teicoplanin trough concentrations by machine learning, and explain the feature importance in the prediction model using the SHapley Additive exPlanation (SHAP) method. Methods A retrospective study was performed on 279 therapeutic drug monitoring (TDM) measurements obtained from 192 patients who were treated with teicoplanin intravenously at the First Affiliated Hospital of Army Medical University from November 2017 to July 2021. This study included 27 variables, and the teicoplanin trough concentrations were considered as the target variable. The whole dataset was divided into a training group and testing group at the ratio of 8:2, and predictive performance was compared among six different algorithms. Algorithms with higher model performance (top 3) were selected to establish the ensemble prediction model and SHAP was employed to interpret the model. Results Three algorithms (SVR, GBRT, and RF) with high R2 scores (0.676, 0.670, and 0.656, respectively) were selected to construct the ensemble model at the ratio of 6:3:1. The model with R2 = 0.720, MAE = 3.628, MSE = 22.571, absolute accuracy of 83.93%, and relative accuracy of 60.71% was obtained, which performed better in model fitting and had better prediction accuracy than any single algorithm. The feature importance and direction of each variable were visually demonstrated by SHAP values, in which teicoplanin administration and renal function were the most important factors. Conclusion We firstly adopted a machine learning approach to predict the teicoplanin trough concentration, and interpreted the prediction model by the SHAP method, which is of great significance and value for the clinical medication guidance.
Collapse
Affiliation(s)
- Pan Ma
- Department of Pharmacy, The First Affiliated Hospital of Third Military Medical University (Army Medical University), Chongqing, China
| | - Ruixiang Liu
- Department of Pharmacy, The First Affiliated Hospital of Third Military Medical University (Army Medical University), Chongqing, China
| | - Wenrui Gu
- Department of Pharmacy, The First Affiliated Hospital of Third Military Medical University (Army Medical University), Chongqing, China
| | - Qing Dai
- Department of Pharmacy, The First Affiliated Hospital of Third Military Medical University (Army Medical University), Chongqing, China
| | - Yu Gan
- Department of Pharmacy, The First Affiliated Hospital of Third Military Medical University (Army Medical University), Chongqing, China
| | - Jing Cen
- Department of Pharmacy, The First Affiliated Hospital of Third Military Medical University (Army Medical University), Chongqing, China
| | - Shenglan Shang
- Department of Clinical Pharmacy, General Hospital of Central Theater Command of PLA, Wuhan, China
| | - Fang Liu
- Department of Pharmacy, The First Affiliated Hospital of Third Military Medical University (Army Medical University), Chongqing, China
| | - Yongchuan Chen
- Department of Pharmacy, The First Affiliated Hospital of Third Military Medical University (Army Medical University), Chongqing, China
| |
Collapse
|
13
|
Stojić A, Jovanović G, Stanišić S, Romanić SH, Šoštarić A, Udovičić V, Perišić M, Milićević T. The PM 2.5-bound polycyclic aromatic hydrocarbon behavior in indoor and outdoor environments, part II: Explainable prediction of benzo[a]pyrene levels. Chemosphere 2022; 289:133154. [PMID: 34871609 DOI: 10.1016/j.chemosphere.2021.133154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Revised: 11/24/2021] [Accepted: 12/02/2021] [Indexed: 06/13/2023]
Abstract
Among the polycyclic aromatic hydrocarbons (PAH), benzo[a]pyrene (B[a]P) has been considered more relevant than other species when estimating the potential exposure-related health effects and has been recognized as a marker of carcinogenic potency of air pollutant mixture. The current understanding of the factors which govern non-linear behavior of B[a]P and associated pollutants and environmental processes is insufficient and further research has to rely on the advanced analytical approach which averts the assumptions and avoids simplifications required by linear modeling methods. For the purpose of this study, we employed eXtreme Gradient Boosting (XGBoost), SHapley Additive exPlanations (SHAP) attribution method, and SHAP value fuzzy clustering to investigate the concentrations of inorganic gaseous pollutants, radon, PM2.5 and particle constituents including trace metals, ions, 16 US EPA priority PM2.5-bound PAHs and 31 meteorological variables, as key factors which shape indoor and outdoor PM2.5-bound B[a]P distribution in a university building located in the urban area of Belgrade (Serbia). According to the results, the indoor and outdoor B[a]P levels were shown to be highly correlated and mostly influenced by the concentrations of Chry, B[b]F, CO, B[a]A, I[cd]P, B[k]F, Flt, D[ah]A, Pyr, B[ghi]P, Cr, As, and PM2.5 in both indoor and outdoor environments. Besides, high B[a]P concentration events were recorded during the periods of low ambient temperature (<12 °C), unstable weather conditions with precipitation and increased soil humidity.
Collapse
Affiliation(s)
- Andreja Stojić
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 118 Pregrevica Street, 11000, Belgrade, Serbia; Singidunum University, 32 Danijelova Street, 11000, Belgrade, Serbia
| | - Gordana Jovanović
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 118 Pregrevica Street, 11000, Belgrade, Serbia; Singidunum University, 32 Danijelova Street, 11000, Belgrade, Serbia
| | - Svetlana Stanišić
- Singidunum University, 32 Danijelova Street, 11000, Belgrade, Serbia.
| | - Snježana Herceg Romanić
- Institute for Medical Research and Occupational Health, 2 Ksaverska Cesta Street, PO Box 291, 10001, Zagreb, Croatia
| | - Andrej Šoštarić
- Institute of Public Health Belgrade, 54 Despota Stefana Street, 11000, Belgrade, Serbia
| | - Vladimir Udovičić
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 118 Pregrevica Street, 11000, Belgrade, Serbia
| | - Mirjana Perišić
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 118 Pregrevica Street, 11000, Belgrade, Serbia; Singidunum University, 32 Danijelova Street, 11000, Belgrade, Serbia
| | - Tijana Milićević
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 118 Pregrevica Street, 11000, Belgrade, Serbia
| |
Collapse
|
14
|
|
15
|
Wang C, Feng L, Qi Y. Explainable deep learning predictions for illness risk of mental disorders in Nanjing, China. Environ Res 2021; 202:111740. [PMID: 34329635 DOI: 10.1016/j.envres.2021.111740] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 07/16/2021] [Accepted: 07/19/2021] [Indexed: 06/13/2023]
Abstract
Epidemiological studies have revealed the associations of air pollutants and meteorological factors with a range of mental health conditions. However, little is known about local explanations and global understanding on the importance and effect of input features in the complex system of environmental stressors - mental disorders (MDs), especially for exposure to air pollution mixture. In this study, we combined deep learning neural networks (DLNNs) with SHapley Additive exPlanation (SHAP) to predict the illness risk of MDs on the population level, and then provided explanations for risk factors. The modeling system, which was trained on day-by-day hospital outpatient visits of two major hospitals in Nanjing, China from 2013/07/01 through 2019/02/28, visualized the time-varying prediction, contributing factors, and interaction effects of informative features. Our results suggested that NO2, SO2, and CO made outstanding contributions in magnitude of feature attributions under circumstances of mixed air pollutants. In particular, NO2 at high concentration level was associated with an increase in illness risk of MDs, and the maximum and mean absolute SHAP value were approximated to 10 and 2 as a local and global measure of feature importance, respectively. It presented a marginally antagonistic effect for two pairs of gaseous pollutants, i.e., NO2 vs. SO2 and CO vs. NO2. In contrast, CO and SO2 displayed the opposite direction of feature effects to the rise of observed concentrations, but an apparent synergistic effect was obviously captured. The primary risk factors driving a sharp increase in acute attack or exacerbation of MDs were also identified by depicting prediction paths of time-series samples. We believe that the significance of coupling accurate predictions from DLNNs with interpretable explanations of why a prediction is completed has broad applicability throughout the field of environmental health.
Collapse
Affiliation(s)
- Ce Wang
- School of Energy and Environment, Southeast University, Nanjing, 210096, PR China; State Key Laboratory of Environmental Medicine Engineering, Ministry of Education, Southeast University, Nanjing, 210096, PR China.
| | - Lan Feng
- National-Provincial Joint Engineering Research Center of Electromechanical Product Packaging, College of Civil Engineering, Nanjing Forestry University, Nanjing, 210037, PR China.
| | - Yi Qi
- School of Architecture and Urban Planning, Nanjing University, No. 22 Hankoulu Road, Nanjing, 210093, PR China.
| |
Collapse
|
16
|
Wang F, Wang Y, Zhang K, Hu M, Weng Q, Zhang H. Spatial heterogeneity modeling of water quality based on random forest regression and model interpretation. Environ Res 2021; 202:111660. [PMID: 34265353 DOI: 10.1016/j.envres.2021.111660] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 06/28/2021] [Accepted: 07/04/2021] [Indexed: 06/13/2023]
Abstract
A systematic understanding of the spatial distribution of water quality is critical for successful watershed management; however, the limited number of physical monitoring stations has restricted the evaluation of spatial water quality distribution and the identification of features impacting the water quality. To fill this gap, we developed a modeling process that employed the random forest regression (RFR) to model the water quality distribution for the Taihu Lake basin in Zhejiang Province, China, and adopted the Shapley Additive exPlanations (SHAP) method to interpret the underlying driving forces. We first used RFR to model three water quality parameters: permanganate index (CODMn), total phosphorus (TP), and total nitrogen (TN), based on 16 watershed features. We then applied the built models to generate water quality distribution maps for the basin, with the CODMn ranging from 1.39 to 6.40 mg/L, TP from 0.02 to 0.23 mg/L, and TN from 1.43 to 4.27 mg/L. These maps showed generally consistent patterns among the CODMn, TN, and TP with minor differences in the spatial distribution. The SHAP analysis showed that the TN was mainly affected by agricultural non-point sources, while the CODMn and TP were affected by agricultural and domestic sources. Due to differences in sewage collection and treatment between urban and rural areas, the water quality in highly populated urban areas was better than that in rural areas, which led to an unexpected positive relationship between water quality and population density. Overall, with the RFR models and SHAP interpretation, we obtained a continuous distribution pattern of the water quality and identified its driving forces in the basin. These findings provided important information to assist water quality restoration projects.
Collapse
Affiliation(s)
- Feier Wang
- College of Environmental & Resource Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Yixu Wang
- College of Environmental & Resource Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Kai Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, OH, 44106, United States
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, OH, 44195, United States
| | - Qin Weng
- College of Environmental & Resource Sciences, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Huichun Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, OH, 44106, United States.
| |
Collapse
|
17
|
|
18
|
Li R, Shinde A, Liu A, Glaser S, Lyou Y, Yuh B, Wong J, Amini A. Machine Learning-Based Interpretation and Visualization of Nonlinear Interactions in Prostate Cancer Survival. JCO Clin Cancer Inform 2021; 4:637-646. [PMID: 32673068 DOI: 10.1200/cci.20.00002] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Shapley additive explanation (SHAP) values represent a unified approach to interpreting predictions made by complex machine learning (ML) models, with superior consistency and accuracy compared with prior methods. We describe a novel application of SHAP values to the prediction of mortality risk in prostate cancer. METHODS Patients with nonmetastatic, node-negative prostate cancer, diagnosed between 2004 and 2015, were identified using the National Cancer Database. Model features were specified a priori: age, prostate-specific antigen (PSA), Gleason score, percent positive cores (PPC), comorbidity score, and clinical T stage. We trained a gradient-boosted tree model and applied SHAP values to model predictions. Open-source libraries in Python 3.7 were used for all analyses. RESULTS We identified 372,808 patients meeting the inclusion criteria. When analyzing the interaction between PSA and Gleason score, we demonstrated consistency with the literature using the example of low-PSA, high-Gleason prostate cancer, recently identified as a unique entity with a poor prognosis. When analyzing the PPC-Gleason score interaction, we identified a novel finding of stronger interaction effects in patients with Gleason ≥ 8 disease compared with Gleason 6-7 disease, particularly with PPC ≥ 50%. Subsequent confirmatory linear analyses supported this finding: 5-year overall survival in Gleason ≥ 8 patients was 87.7% with PPC < 50% versus 77.2% with PPC ≥ 50% (P < .001), compared with 89.1% versus 86.0% in Gleason 7 patients (P < .001), with a significant interaction term between PPC ≥ 50% and Gleason ≥ 8 (P < .001). CONCLUSION We describe a novel application of SHAP values for modeling and visualizing nonlinear interaction effects in prostate cancer. This ML-based approach is a promising technique with the potential to meaningfully improve risk stratification and staging systems.
Collapse
Affiliation(s)
- Richard Li
- Department of Radiation Oncology, City of Hope Medical Center, Duarte, CA
| | - Ashwin Shinde
- Department of Radiation Oncology, City of Hope Medical Center, Duarte, CA
| | - An Liu
- Department of Radiation Oncology, City of Hope Medical Center, Duarte, CA
| | - Scott Glaser
- Department of Radiation Oncology, City of Hope Medical Center, Duarte, CA
| | - Yung Lyou
- Department of Medical Oncology, City of Hope Medical Center, Duarte, CA
| | - Bertram Yuh
- Department of Urology, City of Hope Medical Center, Duarte, CA
| | - Jeffrey Wong
- Department of Radiation Oncology, City of Hope Medical Center, Duarte, CA
| | - Arya Amini
- Department of Radiation Oncology, City of Hope Medical Center, Duarte, CA
| |
Collapse
|
19
|
Lv H, Yang X, Wang B, Wang S, Du X, Tan Q, Hao Z, Liu Y, Yan J, Xia Y. Machine Learning-Driven Models to Predict Prognostic Outcomes in Patients Hospitalized With Heart Failure Using Electronic Health Records: Retrospective Study. J Med Internet Res 2021; 23:e24996. [PMID: 33871375 PMCID: PMC8094022 DOI: 10.2196/24996] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 01/04/2021] [Accepted: 03/16/2021] [Indexed: 01/16/2023] Open
Abstract
Background With the prevalence of cardiovascular diseases increasing worldwide, early prediction and accurate assessment of heart failure (HF) risk are crucial to meet the clinical demand. Objective Our study objective was to develop machine learning (ML) models based on real-world electronic health records to predict 1-year in-hospital mortality, use of positive inotropic agents, and 1-year all-cause readmission rate. Methods For this single-center study, we recruited patients with newly diagnosed HF hospitalized between December 2010 and August 2018 at the First Affiliated Hospital of Dalian Medical University (Liaoning Province, China). The models were constructed for a population set (90:10 split of data set into training and test sets) using 79 variables during the first hospitalization. Logistic regression, support vector machine, artificial neural network, random forest, and extreme gradient boosting models were investigated for outcome predictions. Results Of the 13,602 patients with HF enrolled in the study, 537 (3.95%) died within 1 year and 2779 patients (20.43%) had a history of use of positive inotropic agents. ML algorithms improved the performance of predictive models for 1-year in-hospital mortality (areas under the curve [AUCs] 0.92-1.00), use of positive inotropic medication (AUCs 0.85-0.96), and 1-year readmission rates (AUCs 0.63-0.96). A decision tree of mortality risk was created and stratified by single variables at levels of high-sensitivity cardiac troponin I (<0.068 μg/L), followed by percentage of lymphocytes (<14.688%) and neutrophil count (4.870×109/L). Conclusions ML techniques based on a large scale of clinical variables can improve outcome predictions for patients with HF. The mortality decision tree may contribute to guiding better clinical risk assessment and decision making.
Collapse
Affiliation(s)
- Haichen Lv
- Department of Cardiology, The First Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Xiaolei Yang
- Department of Cardiology, The First Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Bingyi Wang
- Medical Department, Yidu Cloud (Beijing) Technology Co Ltd, Beijing, China
| | - Shaobo Wang
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, China.,AI Lab, Yidu Cloud (Beijing) Technology Co Ltd, Beijing, China
| | - Xiaoyan Du
- Medical Department, Yidu Cloud (Beijing) Technology Co Ltd, Beijing, China
| | - Qian Tan
- Medical Department, Happy Life Technology Co Ltd, Beijing, China
| | - Zhujing Hao
- Department of Cardiology, The First Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Ying Liu
- Department of Cardiology, The First Affiliated Hospital of Dalian Medical University, Dalian, China
| | - Jun Yan
- AI Lab, Yidu Cloud (Beijing) Technology Co Ltd, Beijing, China
| | - Yunlong Xia
- Department of Cardiology, The First Affiliated Hospital of Dalian Medical University, Dalian, China
| |
Collapse
|
20
|
Michael Y, Helman D, Glickman O, Gabay D, Brenner S, Lensky IM. Forecasting fire risk with machine learning and dynamic information derived from satellite vegetation index time-series. Sci Total Environ 2021; 764:142844. [PMID: 33158519 DOI: 10.1016/j.scitotenv.2020.142844] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 09/30/2020] [Accepted: 10/01/2020] [Indexed: 05/21/2023]
Abstract
Fire risk mapping - mapping the probability of fire occurrence and spread - is essential for pre-fire management as well as for efficient firefighting efforts. Most fire risk maps are generated using static information on variables such as topography, vegetation density, and fuel instantaneous wetness. Satellites are often used to provide such information. However, long-term vegetation dynamics and the cumulative dryness status of the woody vegetation, which may affect fire occurrence and spread, are rarely considered in fire risk mapping. Here, we investigate the impact of two satellite-derived metrics that represent long-term vegetation status and dynamics on fire risk mapping - the long-term mean normalized difference vegetation index (NDVI) of the woody vegetation (NDVIW) and its trend (NDVIT). NDVIW represents the mean woody density at the grid cell, while NDVIT is the 5-year trend of the woody NDVI representing the long-term dryness status of the vegetation. To produce these metrics, we decompose time-series of satellite-derived NDVI following a method adjusted for Mediterranean woodlands and forests. We tested whether these metrics improve fire risk mapping using three machine learning (ML) algorithms (Logistic Regression, Random Forest, and XGBoost). We chose the 2007 wildfires in Greece for the analysis. Our results indicate that XGBoost, which accounts for variable interactions and non-linear effects, was the ML model that produced the best results. NDVIW improved the model performance, while NDVIT was significant only when NDVIW was high. This NDVIW-NDVIT interaction means that the long-term dryness effect is meaningful only in places of dense woody vegetation. The proposed method can produce more accurate fire risk maps than conventional methods and can supply important dynamic information that may be used in fire behavior models.
Collapse
Affiliation(s)
- Yaron Michael
- Department of Geography and Environment, Bar-Ilan University, Israel.
| | - David Helman
- Department of Soil and Water Sciences, The Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, P.O.B. 12, Rehovot 7610001, Israel; The Advanced School for Environmental Studies, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Oren Glickman
- The Data Science Institute, Bar-Ilan University, Israel
| | - David Gabay
- The Data Science Institute, Bar-Ilan University, Israel
| | - Steve Brenner
- Department of Geography and Environment, Bar-Ilan University, Israel
| | - Itamar M Lensky
- Department of Geography and Environment, Bar-Ilan University, Israel
| |
Collapse
|
21
|
Gujral H, Sinha A. Association between exposure to airborne pollutants and COVID-19 in Los Angeles, United States with ensemble-based dynamic emission model. Environ Res 2021; 194:110704. [PMID: 33417905 PMCID: PMC7836725 DOI: 10.1016/j.envres.2020.110704] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Revised: 12/13/2020] [Accepted: 12/29/2020] [Indexed: 05/09/2023]
Abstract
This study aims to find the association between short-term exposure to air pollutants, such as particulate matters and ground-level ozone, and SARS-CoV-2 confirmed cases. Generalized linear models (GLM), a typical choice for ecological modeling, have well-established limitations. These limitations include apriori assumptions, inability to handle multicollinearity, and considering differential effects as the fixed effect. We propose an Ensemble-based Dynamic Emission Model (EDEM) to address these limitations. EDEM is developed at the intersection of network science and ensemble learning, i.e., a specialized approach of machine learning. Generalized Additive Model (GAM), i.e., a variant of GLM, and EDEM are tested in Los Angeles and Ventura counties of California, which is one of the biggest SARS-CoV-2 clusters in the US. GAM depicts that a 1 μg/m3, 1 μg/m3, and 1 ppm increase (lag 0-7) in PM 2.5, PM 10, and O3 is associated with 4.51% (CI: 7.01 to -2.00) decrease, 1.62% (CI: 2.23 to -1.022) decrease, and 4.66% (CI: 0.85 to 8.47) increase in daily SARS-CoV-2 cases, respectively. Subsequent increment in lag resulted in the negative association between pollutants and SARS-CoV-2 cases. EDEM results in an R2 score of 90.96% and 79.16% on training and testing datasets, respectively. EDEM confirmed the negative association between particulates and SARS-CoV-2 cases; whereas, the O3 depicts a positive association; however, the positive association observed through GAM is not statistically significant. In addition, the county-level analysis of pollutant concentration interactions suggests that increased emissions from other counties positively affect SARS-CoV-2 cases in adjoining counties as well. The results reiterate the significance of uniformly adhering to air pollution mitigation strategies, especially related to ground-level ozone.
Collapse
Affiliation(s)
- Harshit Gujral
- Department of Computer Science Engineering and IT, Jaypee Institute of Information Technology, Noida, India.
| | - Adwitiya Sinha
- Department of Computer Science Engineering and IT, Jaypee Institute of Information Technology, Noida, India.
| |
Collapse
|
22
|
Stanišić S, Perišić M, Jovanović G, Milićević T, Romanić SH, Jovanović A, Šoštarić A, Udovičić V, Stojić A. The PM 2.5-bound polycyclic aromatic hydrocarbon behavior in indoor and outdoor environments, part I: Emission sources. Environ Res 2021; 193:110520. [PMID: 33259787 DOI: 10.1016/j.envres.2020.110520] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/12/2020] [Revised: 11/17/2020] [Accepted: 11/20/2020] [Indexed: 06/12/2023]
Abstract
The previous research, aimed at exploring the relationships between the indoor and outdoor air quality, has evidenced that outdoor PM2.5-bound polycyclic aromatic hydrocarbons (PAH) levels exhibit significant daily and seasonal variations which does not necessary corresponds with PAH indoor dynamics. For the purpose of this study, a three-month measurement campaign was performed simultaneously at indoor and outdoor sampling sites of a university building in an urban area of Belgrade (Serbia), during which the concentrations of O3, CO, SO2, NOx, radon, PM2.5 and particle constituents including trace metals (As, Cd, Cr, Mn, Ni and Pb), ions (Cl-, Na+, Mg2+, Ca2+, K+, NO3-, SO42- and NH4+) and 16 US EPA priority PAHs were determined. Additionally, the analysis included 31 meteorological parameters, out of which 24 were obtained from Global Data Assimilation System (GDAS1) database. The Unmix and PAH diagnostic ratios analysis resolved the source profiles for both indoor and outdoor environment, which are comparable in terms of their apportionments and pollutant shares, although it should be emphasized that ratio-implied solutions should be taken with caution since these values do not reflect emission sources only. The highest contributions to air quality were attributed to sources identified as coal combustion and related pyrogenic processes. Noticeable correlations were observed between 5- and 6-ring high molecular weight PAHs, but, except for CO, no significant linear dependencies with other investigated variables were identified. The PAH level predictions in the indoor and outdoor environment was performed by using machine learning XGBoost method.
Collapse
Affiliation(s)
- Svetlana Stanišić
- Singidunum University, 32 Danijelova Street, Belgrade, 11000, Serbia.
| | - Mirjana Perišić
- Singidunum University, 32 Danijelova Street, Belgrade, 11000, Serbia; Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 118 Pregrevica Street, 11000, Belgrade, Serbia.
| | - Gordana Jovanović
- Singidunum University, 32 Danijelova Street, Belgrade, 11000, Serbia; Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 118 Pregrevica Street, 11000, Belgrade, Serbia.
| | - Tijana Milićević
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 118 Pregrevica Street, 11000, Belgrade, Serbia.
| | - Snježana Herceg Romanić
- Institute for Medical Research and Occupational Health, 2 Ksaverska Cesta Street, PO Box 291, 10001, Zagreb, Croatia.
| | - Aleksandar Jovanović
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 118 Pregrevica Street, 11000, Belgrade, Serbia.
| | - Andrej Šoštarić
- Institute of Public Health Belgrade, 54 Despota Stefana Street, 11000, Belgrade, Serbia.
| | - Vladimir Udovičić
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 118 Pregrevica Street, 11000, Belgrade, Serbia.
| | - Andreja Stojić
- Singidunum University, 32 Danijelova Street, Belgrade, 11000, Serbia; Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, 118 Pregrevica Street, 11000, Belgrade, Serbia.
| |
Collapse
|
23
|
Vourganas I, Stankovic V, Stankovic L. Individualised Responsible Artificial Intelligence for Home-Based Rehabilitation. Sensors (Basel) 2020; 21:E2. [PMID: 33374913 PMCID: PMC7792599 DOI: 10.3390/s21010002] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/06/2020] [Revised: 12/09/2020] [Accepted: 12/17/2020] [Indexed: 01/23/2023]
Abstract
Socioeconomic reasons post-COVID-19 demand unsupervised home-based rehabilitation and, specifically, artificial ambient intelligence with individualisation to support engagement and motivation. Artificial intelligence must also comply with accountability, responsibility, and transparency (ART) requirements for wider acceptability. This paper presents such a patient-centric individualised home-based rehabilitation support system. To this end, the Timed Up and Go (TUG) and Five Time Sit To Stand (FTSTS) tests evaluate daily living activity performance in the presence or development of comorbidities. We present a method for generating synthetic datasets complementing experimental observations and mitigating bias. We present an incremental hybrid machine learning algorithm combining ensemble learning and hybrid stacking using extreme gradient boosted decision trees and k-nearest neighbours to meet individualisation, interpretability, and ART design requirements while maintaining low computation footprint. The model reaches up to 100% accuracy for both FTSTS and TUG in predicting associated patient medical condition, and 100% or 83.13%, respectively, in predicting area of difficulty in the segments of the test. Our results show an improvement of 5% and 15% for FTSTS and TUG tests, respectively, over previous approaches that use intrusive means of monitoring such as cameras.
Collapse
Affiliation(s)
- Ioannis Vourganas
- Department of Electronic and Electrical Engineering, University of Strathclyde, Glasgow G1 1XW, UK; (V.S.); (L.S.)
| | | | | |
Collapse
|
24
|
Zhang K, Zhang H. Coupling a Feedforward Network (FN) Model to Real Adsorbed Solution Theory (RAST) to Improve Prediction of Bisolute Adsorption on Resins. Environ Sci Technol 2020; 54:15385-15394. [PMID: 33187396 DOI: 10.1021/acs.est.0c03700] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
When predicting bisolute adsorption, the adsorbed solution theory (AST) and real adsorbed solution theory (RAST) either frequently show high prediction deviations or require bisolute adsorption data. Emerging feedforward network (FN) models can provide high prediction accuracy but lack broad applicability. To avoid those limitations, adsorption experiments were performed for a total of 12 single solutes and 55 bisolutes onto two widely used resins (MN200 and XAD-4). Different FN-based models were then built and compared with AST and RAST, based on which a new modeling strategy coupling FN to RAST and requiring only single-solute data was proposed. The root-mean-square error (RMSE) of predictions by the FN-RAST is 0.082 log units for 50 bisolute adsorption on MN200, much lower than that by AST (0.164) and slightly higher than that by RAST (0.069) or the best FN model (0.068). The FN-RAST model further provided satisfactory predictions for 5 bisolute adsorption on XAD-4 (RMSE = 0.10), which is comparable to that by RAST (0.10) and much lower than those by AST (0.26) and FN model (0.38). Therefore, the FN-RAST enjoys both satisfactory prediction accuracy and some broad applicability. The values of Abraham descriptors E and S were also founded to help assess/compare the nonideal behavior in different bisolute mixtures.
Collapse
Affiliation(s)
- Kai Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| | - Huichun Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| |
Collapse
|
25
|
Chemura A, Schauberger B, Gornott C. Impacts of climate change on agro-climatic suitability of major food crops in Ghana. PLoS One 2020; 15:e0229881. [PMID: 32598391 PMCID: PMC7323970 DOI: 10.1371/journal.pone.0229881] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Accepted: 06/14/2020] [Indexed: 12/02/2022] Open
Abstract
Climate change is projected to impact food production stability in many tropical countries through impacts on crop potential. However, without quantitative assessments of where, by how much and to what extent crop production is possible now and under future climatic conditions, efforts to design and implement adaptation strategies under Nationally Determined Contributions (NDCs) and National Action Plans (NAP) are unsystematic. In this study, we used extreme gradient boosting, a machine learning approach to model the current climatic suitability for maize, sorghum, cassava and groundnut in Ghana using yield data and agronomically important variables. We then used multi-model future climate projections for the 2050s and two greenhouse gas emissions scenarios (RCP 2.6 and RCP 8.5) to predict changes in the suitability range of these crops. We achieved a good model fit in determining suitability classes for all crops (AUC = 0.81–0.87). Precipitation-based factors are suggested as most important in determining crop suitability, though the importance is crop-specific. Under projected climatic conditions, optimal suitability areas will decrease for all crops except for groundnuts under RCP8.5 (no change: 0%), with greatest losses for maize (12% under RCP2.6 and 14% under RCP8.5). Under current climatic conditions, 18% of Ghana has optimal suitability for two crops, 2% for three crops with no area having optimal suitability for all the four crops. Under projected climatic conditions, areas with optimal suitability for two and three crops will decrease by 12% as areas having moderate and marginal conditions for multiple crops increase. We also found that although the distribution of multiple crop suitability is spatially distinct, cassava and groundnut will be more simultaneously suitable for the south while groundnut and sorghum will be more suitable for the northern parts of Ghana under projected climatic conditions.
Collapse
Affiliation(s)
- Abel Chemura
- Potsdam Institute for Climate Impact Research (PIK), Member of the Leibniz Association, Potsdam, Germany
- * E-mail:
| | - Bernhard Schauberger
- Potsdam Institute for Climate Impact Research (PIK), Member of the Leibniz Association, Potsdam, Germany
| | - Christoph Gornott
- Potsdam Institute for Climate Impact Research (PIK), Member of the Leibniz Association, Potsdam, Germany
| |
Collapse
|
26
|
Yan L, Diao Y, Lang Z, Gao K. Corrosion rate prediction and influencing factors evaluation of low-alloy steels in marine atmosphere using machine learning approach. Sci Technol Adv Mater 2020; 21:359-370. [PMID: 32939161 PMCID: PMC7476538 DOI: 10.1080/14686996.2020.1746196] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 03/19/2020] [Accepted: 03/19/2020] [Indexed: 06/11/2023]
Abstract
The empirical modeling methods are widely used in corrosion behavior analysis. But due to the limited regression ability of conventional algorithms, modeling objects are often limited to individual factors and specific environments. This study proposed a modeling method based on machine learning to simulate the marine atmospheric corrosion behavior of low-alloy steels. The correlations between material, environmental factors and corrosion rate were evaluated, and their influences on the corrosion behavior of steels were analyzed intuitively. By using the selected dominating factors as input variables, an optimized random forest model was established with a high prediction accuracy of corrosion rate (R2 values, 0.94 and 0.73 to the training set and testing set) to different low-alloy steel samples in several typical marine atmospheric environments. The results demonstrated that machine learning was efficient in corrosion behavior analysis, which usually involves a regression analysis of multiple factors.
Collapse
Affiliation(s)
- Luchun Yan
- School of Materials Science and Engineering, University of Science and Technology Beijing, Beijing, China
| | - Yupeng Diao
- School of Materials Science and Engineering, University of Science and Technology Beijing, Beijing, China
| | - Zhaoyang Lang
- School of Materials Science and Engineering, University of Science and Technology Beijing, Beijing, China
| | - Kewei Gao
- School of Materials Science and Engineering, University of Science and Technology Beijing, Beijing, China
- Beijing Advanced Innovation Center for Materials Genome Engineering, University of Science and Technology Beijing, Beijing, China
| |
Collapse
|
27
|
Zhang K, Zhong S, Zhang H. Predicting Aqueous Adsorption of Organic Compounds onto Biochars, Carbon Nanotubes, Granular Activated Carbons, and Resins with Machine Learning. Environ Sci Technol 2020; 54:7008-7018. [PMID: 32383863 DOI: 10.1021/acs.est.0c02526] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Predictive models are useful tools for aqueous adsorption research; existing models such as multilinear regression (MLR), however, can only predict adsorption under specific equilibrium concentrations or for certain adsorption isotherm models. Also, few studies have discussed data processing beyond applying different modeling algorithms to improve the prediction accuracy. In this research, we employed a cosine similarity approach that focused on mining the available data before developing models; this approach can mine the most relevant data concerning the prediction target to build models and was found to considerably improve the prediction accuracy. We then built a machine-learning modeling process based on neural networks (NN), a group-selection data-splitting strategy for grouped adsorption data for adsorbent-adsorbate pairs under different equilibrium concentrations, and polyparameter linear free energy relationships (pp-LFERs) for aqueous adsorption of 165 organic compounds onto 50 biochars, 34 carbon nanotubes, 35 GACs, and 30 polymeric resins. The final NN-LFER models were successfully applied to various equilibrium concentrations regardless of the adsorption isotherm models and showed less prediction deviations than the published models with the root-mean-square errors 0.23-0.31 versus 0.23-0.97 log unit, and the predictions were improved by adding two key descriptors (BET surface area and pore volume) for the adsorbents. Finally, interpreting the NN-LFER models based on the Shapley values suggested that not considering equilibrium concentration and properties of the adsorbents in the existing MLR models is a possible reason for their higher prediction deviations.
Collapse
Affiliation(s)
- Kai Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| | - Shifa Zhong
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| | - Huichun Zhang
- Department of Civil and Environmental Engineering, Case Western Reserve University, Cleveland, Ohio 44106, United States
| |
Collapse
|
28
|
De Clercq D, Wen Z, Fei F, Caicedo L, Yuan K, Shang R. Interpretable machine learning for predicting biomethane production in industrial-scale anaerobic co-digestion. Sci Total Environ 2020; 712:134574. [PMID: 31931191 DOI: 10.1016/j.scitotenv.2019.134574] [Citation(s) in RCA: 42] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Revised: 09/17/2019] [Accepted: 09/19/2019] [Indexed: 05/12/2023]
Abstract
The objective of this study is to apply machine learning models to accurately predict daily biomethane production in an industrial-scale co-digestion facility. The methodology involved applying elasticnet, random forest, and extreme gradient boosting to input-output data from an industrial-scale anaerobic co-digestion (ACoD) facility. The models were used to predict biomethane for 1-day, 3-day, 5-day, 10-day, 20-day, 30-day, and 40-day time horizons. These models were fit on four years of operational data. The results showed that elastic net (a model with assumptions of linearity) was clearly outperformed by random forest and extreme gradient boosting (XGBoost), which had out-of-sample R2values ranging between 0.80 and 0.88, depending on the time horizon. In addition, feature importance and partial dependence analysis demonstrated the marginal and interaction effects on biomethane of selected biowaste inputs. For instance, food waste co-digested with percolate were shown to have strong positive interaction effects. One implication of this study is that XGBoost and random forest algorithms applied to industrial-scale ACoD data provide dependable prediction results and may be a useful complement for experimental and mechanistic/theoretical models of anaerobic digestion, especially where detailed substrate characterization is difficult. However, these models have limitations, and suggestions for deriving additional value from these methods are proposed.
Collapse
Affiliation(s)
- Djavan De Clercq
- State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, China
| | - Zongguo Wen
- State Key Joint Laboratory of Environment Simulation and Pollution Control, School of Environment, Tsinghua University, China.
| | - Fan Fei
- College of Public Administration, Huazhong University of Science and Technology, China
| | - Luis Caicedo
- Bio-Tesseract, China; EARTH University Costa Rica, Costa Rica
| | - Kai Yuan
- Bio-Tesseract, China; Edinburgh Centre for Robotics, University of Edinburgh, Scotland, United Kingdom
| | - Ruoxi Shang
- Bio-Tesseract, China; College of Engineering, University of California, Berkeley, United States
| |
Collapse
|
29
|
Yoo TK, Ryu IH, Choi H, Kim JK, Lee IS, Kim JS, Lee G, Rim TH. Explainable Machine Learning Approach as a Tool to Understand Factors Used to Select the Refractive Surgery Technique on the Expert Level. Transl Vis Sci Technol 2020; 9:8. [PMID: 32704414 PMCID: PMC7346876 DOI: 10.1167/tvst.9.2.8] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Accepted: 11/18/2019] [Indexed: 12/23/2022] Open
Abstract
Purpose Recently, laser refractive surgery options, including laser epithelial keratomileusis, laser in situ keratomileusis, and small incision lenticule extraction, successfully improved patients' quality of life. Evidence-based recommendation for an optimal surgery technique is valuable in increasing patient satisfaction. We developed an interpretable multiclass machine learning model that selects the laser surgery option on the expert level. Methods A multiclass XGBoost model was constructed to classify patients into four categories including laser epithelial keratomileusis, laser in situ keratomileusis, small incision lenticule extraction, and contraindication groups. The analysis included 18,480 subjects who intended to undergo refractive surgery at the B&VIIT Eye center. Training (n = 10,561) and internal validation (n = 2640) were performed using subjects who visited between 2016 and 2017. The model was trained based on clinical decisions of highly experienced experts and ophthalmic measurements. External validation (n = 5279) was conducted using subjects who visited in 2018. The SHapley Additive ex-Planations technique was adopted to explain the output of the XGBoost model. Results The multiclass XGBoost model exhibited an accuracy of 81.0% and 78.9% when tested on the internal and external validation datasets, respectively. The SHapley Additive ex-Planations explanations for the results were consistent with prior knowledge from ophthalmologists. The explanation from one-versus-one and one-versus-rest XGBoost classifiers was effective for easily understanding users in the multicategorical classification problem. Conclusions This study suggests an expert-level multiclass machine learning model for selecting the refractive surgery for patients. It also provided a clinical understanding in a multiclass problem based on an explainable artificial intelligence technique. Translational Relevance Explainable machine learning exhibits a promising future for increasing the practical use of artificial intelligence in ophthalmic clinics.
Collapse
Affiliation(s)
- Tae Keun Yoo
- Department of Ophthalmology, Aerospace Medical Center, Republic of Korea Air Force, Cheongju, South Korea
| | | | | | | | | | | | | | - Tyler Hyungtaek Rim
- Singapore Eye Research Institute, Singapore National Eye Centre, Duke-NUS Medical School, Singapore, Singapore
| |
Collapse
|
30
|
Chen L, Yao X, Liu Y, Zhu Y, Chen W, Zhao X, Chi T. Measuring Impacts of Urban Environmental Elements on Housing Prices Based on Multisource Data—A Case Study of Shanghai, China. IJGI 2020; 9:106. [DOI: 10.3390/ijgi9020106] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Diverse urban environmental elements provide health and amenity value for residents. People are willing to pay a premium for a better environment. Thus, it is essential to assess the benefits and values of these environmental elements. However, limited by the interpretability of the machine learning model, existing studies cannot fully excavate the complex nonlinear relationships between housing prices and environmental elements, as well as the spatial variations of impacts of urban environmental elements on housing prices. This study explored the impacts of urban environmental elements on residential housing prices based on multisource data in Shanghai. A SHapley Additive exPlanations (SHAP) method was introduced to explain the impacts of urban environmental elements on housing prices. By combining the ensemble learning model and SHAP, the contributions of environmental characteristics derived from street view data and remote sensing data were computed and mapped. The experimental results show that all the urban environmental characteristics account for 16 percent of housing prices in Shanghai. The relationships between housing prices and two green characteristics (green view index from street view data and urban green coverage rate from remote sensing) are both nonlinear. Shanghai’s homebuyers are willing to pay a premium for green only when the green view index or urban green coverage rate are of higher value. However, there are significant differences between the impacts of the green view index and urban green coverage rate on housing prices. The sky view index has a negative influence on housing prices, which is probably because the high-density and high-rise residential area often has better living facilities. Residents in Shanghai are willing to pay a premium for high urban water coverage. The case of Shanghai shows that the proposed framework is practical and efficient. This framework is believed to provide a tool to inform the decisions of housing buyers, property developers and policies concerning land-selling and buying, property development and urban environment improvement.
Collapse
|
31
|
Jovanović G, Romanić SH, Stojić A, Klinčić D, Sarić MM, Letinić JG, Popović A. Introducing of modeling techniques in the research of POPs in breast milk - A pilot study. Ecotoxicol Environ Saf 2019; 172:341-347. [PMID: 30721878 DOI: 10.1016/j.ecoenv.2019.01.087] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Revised: 01/25/2019] [Accepted: 01/25/2019] [Indexed: 06/09/2023]
Abstract
This study used advanced statistical and machine learning methods to investigate organochlorine pesticides (OCPs) and polychlorinated biphenyls (PCBs) in breast milk, assuming that in a complex biological mixture, the pollutants emitted from the same source or with similar properties are statistically interrelated and possibly exhibit non-linear dynamics. The elaborated analyses such as Unmix source apportionment characterized individual source groups, while guided regularized random forest indicated the pollutant dependence on the ortho-chlorine atom attached to the congener's phenyl ring and mother's age. Mutual associations among PCBs were further discussed, but the results implied they were mostly not related to child delivery. PCB congeners -153, -180, -170, -118, -156, -105, and -138 appeared to be compounds of the outmost importance for mutual prediction with reference to their interrelations regarding chemical structure and metabolic processes in the mother's body. Finally, machine learning methods, which provided prediction relative errors lower than 30% and correlation coefficients higher than 0.90, suggested a possible strong non-linear relationship among the pollutants and consequently, the complexity of their pathways in the breast milk.
Collapse
Affiliation(s)
- Gordana Jovanović
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, Pregrevica 118, 11080 Belgrade, Serbia.
| | - Snježana Herceg Romanić
- Institute for Medical Research and Occupational Health, Ksaverska cesta 2, PO Box 291, 10001 Zagreb, Croatia.
| | - Andreja Stojić
- Institute of Physics Belgrade, National Institute of the Republic of Serbia, University of Belgrade, Pregrevica 118, 11080 Belgrade, Serbia.
| | - Darija Klinčić
- Institute for Medical Research and Occupational Health, Ksaverska cesta 2, PO Box 291, 10001 Zagreb, Croatia.
| | - Marijana Matek Sarić
- Department of Health Studies, University of Zadar, Splitska 1, 23000 Zadar, Croatia.
| | | | - Aleksandar Popović
- Faculty of Chemistry, University of Belgrade, Studentski trg 12-16, 11000 Belgrade, Serbia.
| |
Collapse
|