1
|
Hu P, Li J, Ma R, Zhang K, Guo Y, Li G. Temporomandibular joint CBCT image segmentation via multi-view ensemble learning network. Med Biol Eng Comput 2025; 63:693-706. [PMID: 39465436 DOI: 10.1007/s11517-024-03225-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Accepted: 10/12/2024] [Indexed: 10/29/2024]
Abstract
Accurate segmentation of the temporomandibular joint (TMJ) from cone beam CT (CBCT) images holds significant clinical value for diagnosing temporomandibular joint osteoarthrosis (TMJOA) and related conditions. Convolutional neural network-based medical image segmentation methods have achieved state-of-the-art performance in various segmentation tasks. However, 3D medical images segmentation requires substantial global context and rich spatial semantic information, demanding much more GPU memory and computational resources. To address these challenges in 3D medical image segmentation, we propose a novel network- the MVEL-Net (Multi-view Ensemble Learning Network) for TMJ CBCT image segmentation. By resampling images along three dimensions, we generate multiple weak learners with different spatial semantic information. A subsequent strong learning network effectively integrates the outputs from these weak learners to achieve more accurate segmentation results. We evaluated our network model using a clinical dataset comprising 88 subjects with TMJ CBCT images. The average Dice similarity coefficient (DSC) was 0.9817 ± 0.0049, the average surface distance was 0.0540 ± 0.0179 mm, and the 95% Hausdorff distance was 0.1743 ± 0.0550 mm. Our proposed MVEL-Net demonstrates excellent segmentation performance on TMJ from CBCT images, while using fewer GPU memory resources compared to other 3D networks. The effectiveness of this method in capturing spatial context could be leveraged for tasks like organ segmentation from volumetric scans. This may facilitate wider adoption of AI-based solutions for automated analysis of 3D medical images.
Collapse
Affiliation(s)
- Piaolin Hu
- School of Electronics and Information Engineering, Beijing Jiaotong University, Beijing, 100044, China
| | - Jupeng Li
- School of Electronics and Information Engineering, Beijing Jiaotong University, Beijing, 100044, China.
| | - Ruohan Ma
- Peking University School and Hospital of Stomatology, Peking University, Beijing, 100195, China
| | - Kai Zhang
- School of Electronics and Information Engineering, Beijing Jiaotong University, Beijing, 100044, China
| | - Yong Guo
- School of Electronics and Information Engineering, Beijing Jiaotong University, Beijing, 100044, China
| | - Gang Li
- Peking University School and Hospital of Stomatology, Peking University, Beijing, 100195, China
| |
Collapse
|
2
|
Chen F, Zhou H, Yu X, Zhao Y, Wang C, Dai B, Han S. Dual-Stage Stacking Machine Learning Method Considering Virtual Sample Generation for the Prediction of ZIF-8' BET Specific Surface Area with Experimental Validation. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2025; 41:1733-1744. [PMID: 39818973 DOI: 10.1021/acs.langmuir.4c04088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/19/2025]
Abstract
The widespread application of metal-organic frameworks (MOFs) in wastewater and gas treatment has created an increasing demand for accurate and rapid assessment of their BET specific surface area. However, experimental methods for acquiring sufficient statistical data are often costly and time-consuming. Therefore, this study proposes a dual-stage stacking model with Gaussian mixture model-virtual sample generation (GMM-VSG) technology for the BET specific surface area prediction. In this study, 90 real samples were selected from the MOF database and 300 virtual samples were generated. The performance on both real and virtual samples was evaluated by using four machine learning models, including Bayesian regression (Bayes), adaptive boosting (AdaBoost), random forest (RF), and extreme gradient boosting (XGBoost). Subsequently, three best-performing models and a linear regression model were selected for constructing a two-stage stacking model, with R2 value of 0.974. Finally, experimental conditions were adjusted based on feature importance analysis during the validation process, and the result shows that the prediction accuracy of the BET specific surface area is 0.943. This study contributes to the development of more efficient and accurate evaluation methods.
Collapse
Affiliation(s)
- Fengfei Chen
- School of Chemistry and Chemical Engineering, Shihezi University, Shihezi 832003, China
- School of Chemical and Environmental Engineering, Shanghai Institute of Technology, Shanghai 201418, China
| | - Hongguang Zhou
- School of Chemical and Environmental Engineering, Shanghai Institute of Technology, Shanghai 201418, China
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Xiaohui Yu
- School of Chemical and Environmental Engineering, Shanghai Institute of Technology, Shanghai 201418, China
| | - Yunpeng Zhao
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Chenchen Wang
- School of Chemical and Environmental Engineering, Shanghai Institute of Technology, Shanghai 201418, China
| | - Bin Dai
- School of Chemistry and Chemical Engineering, Shihezi University, Shihezi 832003, China
| | - Sheng Han
- School of Chemistry and Chemical Engineering, Shihezi University, Shihezi 832003, China
- School of Chemical and Environmental Engineering, Shanghai Institute of Technology, Shanghai 201418, China
| |
Collapse
|
3
|
Zhu M, Xiao Z, Zhang T, Lu G. Construction of interpretable ensemble learning models for predicting bioaccumulation parameters of organic chemicals in fish. JOURNAL OF HAZARDOUS MATERIALS 2025; 482:136606. [PMID: 39579709 DOI: 10.1016/j.jhazmat.2024.136606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 11/14/2024] [Accepted: 11/19/2024] [Indexed: 11/25/2024]
Abstract
Accurate prediction of bioaccumulation parameters is essential for assessing exposure, hazards, and risks of chemicals. However, the majority of prediction models on bioaccumulation parameters are individual models based on a single algorithm and lack model interpretation, resulting in unsatisfactory prediction accuracy due to inherent constraints of the algorithm and weak interpretability. Ensemble learning (EL) that combine multiple algorithms, coupled with SHapley Additive exPlanation (SHAP) method, may overcome the limitations. Herein, EL models were constructed for three bioaccumulation parameters using datasets covering 2496 chemicals. The EL models demonstrated superior prediction accuracy compared to both individual models developed in this study and those from previous research, achieving a coefficient of determination of up to 0.861 on the validation sets. Applicability domains were characterized using a structure-activity landscape-based (abbreviated as ADSAL) methodology. The optimal EL models, together with the ADSAL, were successfully used to predict bioaccumulation parameters for 4374 chemicals included in the Inventory of Existing Chemical Substances of China. Model interpretation using the SHAP method offered insight into key features influencing bioaccumulation potential, including hydrophobicity, water solubility, polarizability, ionization potential, weight, and volume of molecules. Overall, the study provides data and models to support the sound management and risk assessment of chemicals.
Collapse
Affiliation(s)
- Minghua Zhu
- Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, Hohai University, Nanjing 210098, China; College of Environment, Hohai University, Nanjing 210098, China
| | - Zijun Xiao
- Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education), Dalian Key Laboratory on Chemicals Risk Control and Pollution Prevention Technology, School of Environmental Science and Technology, Dalian University of Technology, Dalian 116024, China
| | - Tao Zhang
- State Key Laboratory of Urban Water Resources and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Guanghua Lu
- Key Laboratory of Integrated Regulation and Resources Development of Shallow Lakes of Ministry of Education, Hohai University, Nanjing 210098, China; College of Environment, Hohai University, Nanjing 210098, China.
| |
Collapse
|
4
|
Usman US, Salh YHM, Yan B, Namahoro JP, Zeng Q, Sallah I. Fluoride contamination in African groundwater: Predictive modeling using stacking ensemble techniques. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 957:177693. [PMID: 39577590 DOI: 10.1016/j.scitotenv.2024.177693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 11/15/2024] [Accepted: 11/19/2024] [Indexed: 11/24/2024]
Abstract
Fluoride contamination of groundwater is a severe public health problem in Africa due to natural factors that include geological weathering of fluoride-bearing minerals and climatic conditions characterized by high evaporation rates that highly elevate fluoride levels. Anthropogenic activities further aggravate the problem and have affected millions of people in countries such as; South Africa, Tanzania, Nigeria, Ethiopia, Ghana, Kenya, Mauritania, Botswana, and Egypt. High fluoride levels of up to 10 mg/L have been encountered in parts of the East African Rift Valley, above the WHO's recommended limit of 1.5 mg/L, causing serious dental and skeletal fluorosis among the affected people. In this study, the distributions of F- in groundwater of Africa were forecast using an advanced stacking ensemble learning model based on 11 crucial groundwater physiochemical variables and 6270 accessible statistics of observed concentrations. The enhanced algorithm incorporates randomized trees, Tree-Bag, RF, DT, XGB, and ET Machine as base trainees, with a simple Naïve Bayes as the meta-analyzer. The model's AUC score of 0.86 accurately represented the uneven distributions of groundwater fluoride. The results showed that 20-35 % of the continent's eastern part and 10 % of its western region are at risk of having fluoride levels exceeding WHO limits, with an expected population of around 80 million. Regionally, fluoride contamination ranges from 0.1 to 3 mg/L in West Africa was range from 0.0 to 13.29 mg/L, 0.01-588 mg/L in East Africa, 0.04-65.9 mg/L in South Africa, and 0.1-10.5 mg/L in North and 0.01-1.9 mg/L in Central Africa. Na+ and HCO3- are Africa's leading primary causes of fluoride contamination, with Ca2+ and Cl- contributing to fluoride influence in some parts of the continent. This study helped identify health concerns linked to groundwater fluoride and offered guidance on assessing health risks in areas with sparse sample sizes.
Collapse
Affiliation(s)
- Usman Sunusi Usman
- School of Environmental Studies, China University of Geosciences Wuhan, 388 Lumo Road, Wuhan 430074, China
| | - Yousif Hassan Mohamed Salh
- School of Environmental Studies, China University of Geosciences Wuhan, 388 Lumo Road, Wuhan 430074, China
| | - Bing Yan
- School of Environmental Studies, China University of Geosciences Wuhan, 388 Lumo Road, Wuhan 430074, China.
| | | | - Qian Zeng
- School of Environmental Studies, China University of Geosciences Wuhan, 388 Lumo Road, Wuhan 430074, China
| | - Ismaila Sallah
- School of Environmental Studies, China University of Geosciences Wuhan, 388 Lumo Road, Wuhan 430074, China
| |
Collapse
|
5
|
Arif U, Zhang C, Hussain S, Abbasi AR. An efficient interpretable stacking ensemble model for lung cancer prognosis. Comput Biol Chem 2024; 113:108248. [PMID: 39426256 DOI: 10.1016/j.compbiolchem.2024.108248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Revised: 09/29/2024] [Accepted: 10/09/2024] [Indexed: 10/21/2024]
Abstract
Lung cancer significantly contributes to global cancer mortality, posing challenges in clinical management. Early detection and accurate prognosis are crucial for improving patient outcomes. This study develops an interpretable stacking ensemble model (SEM) for lung cancer prognosis prediction and identifies key risk factors. Using a Kaggle dataset of 1000 patients with 22 variables, the model classifies prognosis into Low, Medium, and High-risk categories. The bootstrap method was employed for evaluation metrics, while SHAP (Shapley Additive Explanations) and LIME (Local Interpretable Model-agnostic Explanations) assessed model interpretability. Results showed SEM's superior interpretability over traditional models, such as Random Forest, Logistic Regression, Decision Tree, Gradient Boosting Machine, Extreme Gradient Boosting Machine, and Light Gradient Boosting Machine. SEM achieved an accuracy of 98.90 %, precision of 98.70 %, F1 score of 98.85 %, sensitivity of 98.77 %, specificity of 95.45 %, Cohen's kappa value of 94.56 %, and an AUC of 98.10 %. The SEM demonstrated robust performance in lung cancer prognosis, revealing chronic lung cancer and genetic risk as major factors.
Collapse
Affiliation(s)
- Umair Arif
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xian, Shaanxi 710049, China.
| | - Chunxia Zhang
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xian, Shaanxi 710049, China.
| | - Sajid Hussain
- School of Mathematics and Statistics, Xi'an Jiaotong University, Xian, Shaanxi 710049, China.
| | - Abdul Rauf Abbasi
- Department of Statistics, COMSATS University Islamabad, Lahore Campus, Lahore 5400, Pakistan.
| |
Collapse
|
6
|
Chen X, Zhang X, Chen WZ. Advanced Predictive Modeling of Concrete Compressive Strength and Slump Characteristics: A Comparative Evaluation of BPNN, SVM, and RF Models Optimized via PSO. MATERIALS (BASEL, SWITZERLAND) 2024; 17:4791. [PMID: 39410362 PMCID: PMC11478029 DOI: 10.3390/ma17194791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Revised: 09/23/2024] [Accepted: 09/28/2024] [Indexed: 10/20/2024]
Abstract
This study presents the development of predictive models for concrete performance, specifically targeting the compressive strength and slump value, utilizing the quantities of individual raw materials in the concrete mix design as input variables. Three distinct machine learning approaches-Backpropagation Neural Network (BPNN), Support Vector Machine (SVM), and Random Forest (RF)-were employed to establish the prediction models independently. In the model construction process, the Particle Swarm Optimization (PSO) algorithm was integrated with cross-validation to fine-tune the hyperparameters of each model, ensuring optimal performance. Following the completion of training and modeling, a comprehensive comparison of the predictive accuracy among the three models was conducted, with the aim of selecting the most suitable model for incorporation into an optimized objective function. The findings reveal that among the chosen machine learning techniques, BPNN exhibited superior predictive capabilities for the compressive strength of concrete. Specifically, in the validation set, BPNN achieved a high correlation coefficient (R) of 0.9531 between the predicted and actual outputs, accompanied by a low Root Mean Square Error (RMSE) of 4.2568 and a Mean Absolute Error (MAE) of 2.6627, indicating a precise and reliable prediction. Conversely, for the prediction of the concrete slump value, RF outperformed the other two models, demonstrating a correlation coefficient (R) of 0.8986, an RMSE of 9.4906, and an MAE of 5.5034 in the validation set. This underscores the effectiveness of RF in capturing the complexity and variability inherent in slump behavior. Overall, this research highlights the potential of integrating advanced machine learning algorithms with optimization techniques for enhancing the accuracy and efficiency of concrete performance predictions. The identified optimal models, BPNN for compressive strength and RF for slump, can serve as valuable tools for engineers and researchers in the field of construction materials, facilitating the design of concrete mixes tailored to specific performance requirements.
Collapse
Affiliation(s)
- Xuefei Chen
- School of Civil Engineering, Putian University, Putian 351100, China
- Engineering Research Center of Disaster Prevention and Mitigation of Southeast Coastal Engineering Structures (JDGC03), Fujian Province University, Putian 351100, China
| | - Xiucheng Zhang
- School of Civil Engineering, Putian University, Putian 351100, China
- Engineering Research Center of Disaster Prevention and Mitigation of Southeast Coastal Engineering Structures (JDGC03), Fujian Province University, Putian 351100, China
| | - Wei-Zhi Chen
- Faculty of Innovation Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, Macao 999078, China
| |
Collapse
|
7
|
Hong W. Twistable and Stretchable Nasal Patch for Monitoring Sleep-Related Breathing Disorders Based on a Stacking Ensemble Learning Model. ACS APPLIED MATERIALS & INTERFACES 2024; 16:47337-47347. [PMID: 39192683 DOI: 10.1021/acsami.4c11493] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/29/2024]
Abstract
Obstructive sleep apnea syndrome disrupts sleep, destroys the homeostasis of biological systems such as metabolism and the immune system, and reduces learning ability and memory. The existing polysomnography used to measure sleep disorders is executed in an unfamiliar environment, which may result in sleep patterns that are different from usual, reducing accuracy. This study reports a machine learning-based personalized twistable patch system that can simply measure obstructive sleep apnea syndrome in daily life. The stretchable patch attaches directly to the nose in an integrated form factor, detecting sleep-disordered breathing by simultaneously sensing microscopic vibrations and airflow in the nasal cavity and paranasal sinuses. The highly sensitive multichannel patch, which can detect airflow at the level of 0.1 m/s, has flexibility via a unique slit pattern and fabric layer. It has linearity with an R2 of 0.992 in the case of the amount of change according to its curvature. The stacking ensemble learning model predicted the degree of sleep-disordered breathing with an accuracy of 92.9%. The approach demonstrates high sleep disorder detection capacity and proactive visual notification. It is expected to help as a diagnostic platform for the early detection of chronic diseases such as cerebrovascular disease and diabetes.
Collapse
Affiliation(s)
- Wonki Hong
- Department of Digital Healthcare, Daejeon University, Daejeon 34520, Republic of Korea
| |
Collapse
|
8
|
Aljamaan H. Dynamic stacking ensemble for cross-language code smell detection. PeerJ Comput Sci 2024; 10:e2254. [PMID: 39314734 PMCID: PMC11419637 DOI: 10.7717/peerj-cs.2254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Accepted: 07/22/2024] [Indexed: 09/25/2024]
Abstract
Code smells refer to poor design and implementation choices by software engineers that might affect the overall software quality. Code smells detection using machine learning models has become a popular area to build effective models that are capable of detecting different code smells in multiple programming languages. However, the process of building of such effective models has not reached a state of stability, and most of the existing research focuses on Java code smells detection. The main objective of this article is to propose dynamic ensembles using two strategies, namely greedy search and backward elimination, which are capable of accurately detecting code smells in two programming languages (i.e., Java and Python), and which are less complex than full stacking ensembles. The detection performance of dynamic ensembles were investigated within the context of four Java and two Python code smells. The greedy search and backward elimination strategies yielded different base models lists to build dynamic ensembles. In comparison to full stacking ensembles, dynamic ensembles yielded less complex models when they were used to detect most of the investigated Java and Python code smells, with the backward elimination strategy resulting in less complex models. Dynamic ensembles were able to perform comparably against full stacking ensembles with no significant detection loss. This article concludes that dynamic stacking ensembles were able to facilitate the effective and stable detection performance of Java and Python code smells over all base models and with less complexity than full stacking ensembles.
Collapse
Affiliation(s)
- Hamoud Aljamaan
- Information and Computer Science Department, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
- Interdisciplinary Research Center for Finance and Digital Economy, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
| |
Collapse
|
9
|
Cao W, Zhang Z, Fu Y, Zhao L, Ren Y, Nan T, Guo H. Prediction of arsenic and fluoride in groundwater of the North China Plain using enhanced stacking ensemble learning. WATER RESEARCH 2024; 259:121848. [PMID: 38824797 DOI: 10.1016/j.watres.2024.121848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 05/20/2024] [Accepted: 05/28/2024] [Indexed: 06/04/2024]
Abstract
Chronic exposure to elevated geogenic arsenic (As) and fluoride (F-) concentrations in groundwater poses a significant global health risk. In regions around the world where regular groundwater quality assessments are limited, the presence of harmful levels of As and F- in shallow groundwater extracted from specific wells remains uncertain. This study utilized an enhanced stacking ensemble learning model to predict the distributions of As and F- in shallow groundwater based on 4,393 available datasets of observed concentrations and forty relevant environmental factors. The enhanced model was obtained by fusing well-suited Extreme Gradient Boosting, Random Forest, and Support Vector Machine as the base learners and a structurally simple Linear Discriminant Analysis as the meta-learner. The model precisely captured the patchy distributions of groundwater As and F- with an AUC value of 0.836 and 0.853, respectively. The findings revealed that 9.0% of the study area was characterized by a high As risk in shallow groundwater, while 21.2% was at high F- risk identified as having a high risk of fluoride contamination. About 0.2% of the study area shows elevated levels of both of them. The affected populations are estimated at approximately 7.61 million, 34.1 million, and 0.2 million, respectively. Furthermore, sedimentary environment exerted the greatest influence on distribution of groundwater As, with human activities and climate following closely behind at 29.5%, 28.1%, and 21.9%, respectively. Likewise, sedimentary environment was the primary factor affecting groundwater F- distribution, followed by hydrogeology and soil physicochemical properties, contributing 27.8%, 24.0%, and 23.3%, respectively. This study contributed to the identification of health risks associated with shallow groundwater As and F-, and provided insights into evaluating health risks in regions with limited samples.
Collapse
Affiliation(s)
- Wengeng Cao
- The Institute of Hydrogeology and Environmental Geology, Chinese Academy of Geosciences, Shijiazhuang 050061, China; Key Laboratory of Groundwater Sciences and Engineering, Ministry of Natural Resources, Shijiazhuang 050061, China
| | - Zhuo Zhang
- Tianjin Center (North China Center for Geoscience Innovation), China Geological Survey, Tianjin 300170, China.
| | - Yu Fu
- North China University of Water Resources and Electric Power, Zhengzhou 450046, China
| | - Lihua Zhao
- Hebei Provincial academy of water resources, Shijiazhuang 050057, China
| | - Yu Ren
- The Institute of Hydrogeology and Environmental Geology, Chinese Academy of Geosciences, Shijiazhuang 050061, China; Key Laboratory of Groundwater Sciences and Engineering, Ministry of Natural Resources, Shijiazhuang 050061, China
| | - Tian Nan
- The Institute of Hydrogeology and Environmental Geology, Chinese Academy of Geosciences, Shijiazhuang 050061, China; Key Laboratory of Groundwater Sciences and Engineering, Ministry of Natural Resources, Shijiazhuang 050061, China
| | - Huaming Guo
- State Key Laboratory of Biogeology and Environmental Geology, China University of Geosciences, Beijing 100083, China.
| |
Collapse
|
10
|
Talwar AA, Desai AA, McAuliffe PB, Broach RB, Hsu JY, Liu T, Udupa JK, Tong Y, Torigian DA, Fischer JP. Optimal computed tomography-based biomarkers for prediction of incisional hernia formation. Hernia 2024; 28:17-24. [PMID: 37676569 PMCID: PMC11235401 DOI: 10.1007/s10029-023-02835-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 07/04/2023] [Indexed: 09/08/2023]
Abstract
PURPOSE Unstructured data are an untapped source for surgical prediction. Modern image analysis and machine learning (ML) can harness unstructured data in medical imaging. Incisional hernia (IH) is a pervasive surgical disease, well-suited for prediction using image analysis. Our objective was to identify optimal biomarkers (OBMs) from preoperative abdominopelvic computed tomography (CT) imaging which are most predictive of IH development. METHODS Two hundred and twelve rigorously matched colorectal surgery patients at our institution were included. Preoperative abdominopelvic CT scans were segmented to derive linear, volumetric, intensity-based, and textural features. These features were analyzed to find a small subset of OBMs, which are maximally predictive of IH. Three ML classifiers (Ensemble Boosting, Random Forest, SVM) trained on these OBMs were used for prediction of IH. RESULTS Altogether, 279 features were extracted from each CT scan. The most predictive OBMs found were: (1) abdominopelvic visceral adipose tissue (VAT) volume, normalized for height; (2) abdominopelvic skeletal muscle tissue volume, normalized for height; and (3) pelvic VAT volume to pelvic outer aspect of body wall skeletal musculature (OAM) volume ratio. Among ML prediction models, Ensemble Boosting produced the best performance with an AUC of 0.85, accuracy of 0.83, sensitivity of 0.86, and specificity of 0.81. CONCLUSION These OBMs suggest increased intra-abdominopelvic volume/pressure as the salient pathophysiologic driver and likely mechanism for IH formation. ML models using these OBMs are highly predictive for IH development. The next generation of surgical prediction will maximize the utility of unstructured data using advanced image analysis and ML.
Collapse
Affiliation(s)
- A A Talwar
- Division of Plastic Surgery, Department of Surgery, University of Pennsylvania Health System, 3400 Civic Center Boulevard, 14th floor South Tower, Philadelphia, PA, 19104, USA
| | - A A Desai
- Division of Plastic Surgery, Department of Surgery, University of Pennsylvania Health System, 3400 Civic Center Boulevard, 14th floor South Tower, Philadelphia, PA, 19104, USA
| | - P B McAuliffe
- Division of Plastic Surgery, Department of Surgery, University of Pennsylvania Health System, 3400 Civic Center Boulevard, 14th floor South Tower, Philadelphia, PA, 19104, USA
| | - R B Broach
- Division of Plastic Surgery, Department of Surgery, University of Pennsylvania Health System, 3400 Civic Center Boulevard, 14th floor South Tower, Philadelphia, PA, 19104, USA
| | - J Y Hsu
- Division of Biostatistics, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - T Liu
- School of Information Science and Engineering, Yanshan University, Qinhuangdao, China
| | - J K Udupa
- Medical Image Processing Group, Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA
| | - Y Tong
- Medical Image Processing Group, Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA
| | - D A Torigian
- Medical Image Processing Group, Department of Radiology, University of Pennsylvania, Philadelphia, PA, USA
| | - J P Fischer
- Division of Plastic Surgery, Department of Surgery, University of Pennsylvania Health System, 3400 Civic Center Boulevard, 14th floor South Tower, Philadelphia, PA, 19104, USA.
| |
Collapse
|
11
|
Kharazi Esfahani P, Akbari M, Khalili Y. A comparative study of fracture conductivity prediction using ensemble methods in the acid fracturing treatment in oil wells. Sci Rep 2024; 14:648. [PMID: 38182684 PMCID: PMC10770359 DOI: 10.1038/s41598-023-50731-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 12/24/2023] [Indexed: 01/07/2024] Open
Abstract
The study of acid fracture conductivity stands as a pivotal aspect of petroleum engineering, offering a well-established technique to amplify production rates in carbonate reservoirs. This research delves into the intricate dynamics influencing the conductivity of acid fractures, particularly under varying closure stresses and in diverse rock formations. The conductivity of acid fractures is intricately interconnected with the dissolution of rock, etching patterns on fracture surfaces, rock strength, and closure stress. To accurately predict fracture conductivity under different closure stresses, a robust model is necessary. This model involves assessing both the baseline fracture conductivity under zero closure stress and the rate of conductivity variation as closure stress fluctuates. Key among the influential factors affecting fracture conductivity is the type of rock within the reservoir. Understanding and predicting the behavior of different formations under disparate closure stresses poses a significant challenge, as does deciphering the diverse effects of treatment parameters such as acid injection rate and strength on fracture conductivity. In this study, the predictive power of XGBoost, a machine learning algorithm, was explored in assessing acid fracture conductivity in dolomite and limestone formations. The findings revealed XGBoost's ability to outperform previous studies in predicting fracture conductivity in both types of formations. Notably, it exhibited superior accuracy in forecasting fracture conductivity under varying treatment conditions, underscoring its robustness and versatility. The research underscores the pivotal role of closure stress, dissolution rate of rock (DREC), and rock strength in influencing fracture conductivity. By integrating these parameters into the design of acid fracturing operations, accurate predictions can be achieved, allowing for the optimization of treatment designs. This study illuminates the potential of XGBoost in optimizing acid fracturing treatments, ultimately bolstering well productivity in carbonate reservoirs. Furthermore, it advocates for the essential nature of separate modeling and analysis based on rock types to comprehend and optimize fracturing processes. The comparison between dolomite and limestone formations unveiled distinct conductivity behaviors, underlining the significance of tailored analyses based on rock type for precise operational optimization.
Collapse
Affiliation(s)
- Parsa Kharazi Esfahani
- Department of Petroleum Engineering, Amirkabir University of Technology (Tehran Polytechnic), 424 Hafez Avenue, Box 15875-4413, Tehran, 1591634311, Iran
- Department of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), 424 Hafez Avenue, Box 15875-4413, Tehran, 1591634311, Iran
| | - Mohammadreza Akbari
- Department of Petroleum Engineering, Amirkabir University of Technology (Tehran Polytechnic), 424 Hafez Avenue, Box 15875-4413, Tehran, 1591634311, Iran.
| | - Yasin Khalili
- Department of Petroleum Engineering, Amirkabir University of Technology (Tehran Polytechnic), 424 Hafez Avenue, Box 15875-4413, Tehran, 1591634311, Iran
| |
Collapse
|
12
|
Wang C, Cui Z, Yang J, Han M, Carneiro G, Shen D. BowelNet: Joint Semantic-Geometric Ensemble Learning for Bowel Segmentation From Both Partially and Fully Labeled CT Images. IEEE TRANSACTIONS ON MEDICAL IMAGING 2023; 42:1225-1236. [PMID: 36449590 DOI: 10.1109/tmi.2022.3225667] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Accurate bowel segmentation is essential for diagnosis and treatment of bowel cancers. Unfortunately, segmenting the entire bowel in CT images is quite challenging due to unclear boundary, large shape, size, and appearance variations, as well as diverse filling status within the bowel. In this paper, we present a novel two-stage framework, named BowelNet, to handle the challenging task of bowel segmentation in CT images, with two stages of 1) jointly localizing all types of the bowel, and 2) finely segmenting each type of the bowel. Specifically, in the first stage, we learn a unified localization network from both partially- and fully-labeled CT images to robustly detect all types of the bowel. To better capture unclear bowel boundary and learn complex bowel shapes, in the second stage, we propose to jointly learn semantic information (i.e., bowel segmentation mask) and geometric representations (i.e., bowel boundary and bowel skeleton) for fine bowel segmentation in a multi-task learning scheme. Moreover, we further propose to learn a meta segmentation network via pseudo labels to improve segmentation accuracy. By evaluating on a large abdominal CT dataset, our proposed BowelNet method can achieve Dice scores of 0.764, 0.848, 0.835, 0.774, and 0.824 in segmenting the duodenum, jejunum-ileum, colon, sigmoid, and rectum, respectively. These results demonstrate the effectiveness of our proposed BowelNet framework in segmenting the entire bowel from CT images.
Collapse
|
13
|
Mohammed A, Kora R. A Comprehensive Review on Ensemble Deep Learning: Opportunities and Challenges. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2023. [DOI: 10.1016/j.jksuci.2023.01.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
14
|
Zhang X, Ono JP, Song H, Gou L, Ma KL, Ren L. SliceTeller: A Data Slice-Driven Approach for Machine Learning Model Validation. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2023; 29:842-852. [PMID: 36179005 DOI: 10.1109/tvcg.2022.3209465] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Real-world machine learning applications need to be thoroughly evaluated to meet critical product requirements for model release, to ensure fairness for different groups or individuals, and to achieve a consistent performance in various scenarios. For example, in autonomous driving, an object classification model should achieve high detection rates under different conditions of weather, distance, etc. Similarly, in the financial setting, credit-scoring models must not discriminate against minority groups. These conditions or groups are called as "Data Slices". In product MLOps cycles, product developers must identify such critical data slices and adapt models to mitigate data slice problems. Discovering where models fail, understanding why they fail, and mitigating these problems, are therefore essential tasks in the MLOps life-cycle. In this paper, we present SliceTeller, a novel tool that allows users to debug, compare and improve machine learning models driven by critical data slices. SliceTeller automatically discovers problematic slices in the data, helps the user understand why models fail. More importantly, we present an efficient algorithm, SliceBoosting, to estimate trade-offs when prioritizing the optimization over certain slices. Furthermore, our system empowers model developers to compare and analyze different model versions during model iterations, allowing them to choose the model version best suitable for their applications. We evaluate our system with three use cases, including two real-world use cases of product development, to demonstrate the power of SliceTeller in the debugging and improvement of product-quality ML models.
Collapse
|
15
|
Pan X, Zhang G, Lin A, Guan X, Chen P, Ge Y, Chen X. An evaluation model for children's foot & ankle deformity severity using sparse multi-objective feature selection algorithm. Comput Biol Med 2022; 151:106229. [PMID: 36308897 DOI: 10.1016/j.compbiomed.2022.106229] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 10/08/2022] [Accepted: 10/16/2022] [Indexed: 12/27/2022]
Abstract
Foot & ankle deformity is a chronic disease with high incidence and is best treated in childhood. However, the current diagnostic procedures rely on doctor's consultation and empirical judgment, and lack objective and quantitative evaluation methods, resulting in low screening rates. To solve this problem, this paper aims to construct an evaluation model for children's foot & ankle deformity through data mining and machine learning technologies. Firstly, it proposes the grading rules for children's foot & ankle deformity severity based on analyzing the existing quantitative indexes and expert experience. Then the 3D foot scanner is used to collect the sample data including 30 foot structure indexes. Finally, an advanced sparse multi-objective evolutionary algorithm (sparse MO-FS) is present for feature selection. The effectiveness of the proposed sparse MO-FS and its search efficiency are proved by comparing 8 feature selection methods and 7 search strategies. Using sparse MO-FS, foot length, arch index, ankle index, and hallux valgus index are selected, which not only simplifies the evaluation model but also improves the average classification accuracy of random forest to more than 98%.
Collapse
Affiliation(s)
- Xiaotian Pan
- School of Information Management and Artificial Intelligence, Zhejiang University of Finance and Economics, Hangzhou 310018, China.
| | - Guodao Zhang
- School of Media and Design, Hangzhou Dianzi University, Hangzhou 310018, China.
| | - Aiju Lin
- College of international Education, Wenzhou University, Wenzhou 325035, China.
| | - Xiaochun Guan
- Department of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China.
| | - PingKuo Chen
- Great Bay University, Dongguan City 523000, China.
| | - Yisu Ge
- Key Laboratory of Intelligent Informatics for Safety & Emergency of Zhejiang Province, Wenzhou University, Wenzhou 325100, China.
| | - Xin Chen
- Orthopedics Department of The First Affiliated Hospital of Wenzhou Medical University, Wenzhou 325000, China.
| |
Collapse
|
16
|
Augmented language model with deep learning adaptation on sentiment analysis for E-learning recommendation. COGN SYST RES 2022. [DOI: 10.1016/j.cogsys.2022.07.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
17
|
Assessment of Machine Learning Techniques for Oil Rig Classification in C-Band SAR Images. REMOTE SENSING 2022. [DOI: 10.3390/rs14132966] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
This article aims at performing maritime target classification in SAR images using machine learning (ML) and deep learning (DL) techniques. In particular, the targets of interest are oil platforms and ships located in the Campos Basin, Brazil. Two convolutional neural networks (CNNs), VGG-16 and VGG-19, were used for attribute extraction. The logistic regression (LR), random forest (RF), support vector machine (SVM), k-nearest neighbours (kNN), decision tree (DT), naive Bayes (NB), neural networks (NET), and AdaBoost (ADBST) schemes were considered for classification. The target classification methods were evaluated using polarimetric images obtained from the C-band synthetic aperture radar (SAR) system Sentinel-1. Classifiers are assessed by the accuracy indicator. The LR, SVM, NET, and stacking results indicate better performance, with accuracy ranging from 84.1% to 85.5%. The Kruskal–Wallis test shows a significant difference with the tested classifier, indicating that some classifiers present different accuracy results. The optimizations provide results with more significant accuracy gains, making them competitive with those shown in the literature. There is no exact combination of methods for SAR image classification that will always guarantee the best accuracy. The optimizations performed in this article were for the specific data set of the Campos Basin, and results may change depending on the data set format and the number of images.
Collapse
|
18
|
Chatzimparmpas A, Martins RM, Kucher K, Kerren A. FeatureEnVi: Visual Analytics for Feature Engineering Using Stepwise Selection and Semi-Automatic Extraction Approaches. IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS 2022; 28:1773-1791. [PMID: 34990365 DOI: 10.1109/tvcg.2022.3141040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The machine learning (ML) life cycle involves a series of iterative steps, from the effective gathering and preparation of the data-including complex feature engineering processes-to the presentation and improvement of results, with various algorithms to choose from in every step. Feature engineering in particular can be very beneficial for ML, leading to numerous improvements such as boosting the predictive results, decreasing computational times, reducing excessive noise, and increasing the transparency behind the decisions taken during the training. Despite that, while several visual analytics tools exist to monitor and control the different stages of the ML life cycle (especially those related to data and algorithms), feature engineering support remains inadequate. In this paper, we present FeatureEnVi, a visual analytics system specifically designed to assist with the feature engineering process. Our proposed system helps users to choose the most important feature, to transform the original features into powerful alternatives, and to experiment with different feature generation combinations. Additionally, data space slicing allows users to explore the impact of features on both local and global scales. FeatureEnVi utilizes multiple automatic feature selection techniques; furthermore, it visually guides users with statistical evidence about the influence of each feature (or subsets of features). The final outcome is the extraction of heavily engineered features, evaluated by multiple validation metrics. The usefulness and applicability of FeatureEnVi are demonstrated with two use cases and a case study. We also report feedback from interviews with two ML experts and a visualization researcher who assessed the effectiveness of our system.
Collapse
|
19
|
Mai H, Le TC, Hisatomi T, Chen D, Domen K, Winkler DA, Caruso RA. Use of metamodels for rapid discovery of narrow bandgap oxide photocatalysts. iScience 2021; 24:103068. [PMID: 34585115 PMCID: PMC8455646 DOI: 10.1016/j.isci.2021.103068] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 07/07/2021] [Accepted: 08/25/2021] [Indexed: 12/03/2022] Open
Abstract
New photocatalysts are traditionally identified through trial-and-error methods. Machine learning has shown considerable promise for improving the efficiency of photocatalyst discovery from a large potential pool. Here, we describe a multi-step, target-driven consensus method using a stacking meta-learning algorithm that robustly predicts bandgaps and H2 evolution activities of photocatalysts. Trained on small datasets, these models can rapidly screen a large space (>10 million materials) to identify promising, non-toxic compounds as candidate water splitting photocatalysts. Two effective compounds and two controls possessing optimal bandgap values (∼2 eV) but not photoactivity as predicted by the models were synthesized. Their experimentally measured bandgaps and H2 evolution activities were consistent with the predictions. Conspicuously, the two compounds with strong photoactivities under UV and visible light are promising visible-light-driven water splitting photocatalysts. This study demonstrates the power of machine learning and the potential of big data to accelerate discovery of next-generation photocatalysts.
Collapse
Affiliation(s)
- Haoxin Mai
- Applied Chemistry and Environmental Science, School of Science, STEM College, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia
| | - Tu C. Le
- School of Engineering, STEM College, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia
| | - Takashi Hisatomi
- Research Initiative for Supra-Materials (RISM), Shinshu University, 4-17-1 Wakasato, Nagano 380-8553, Japan
| | - Dehong Chen
- Applied Chemistry and Environmental Science, School of Science, STEM College, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia
| | - Kazunari Domen
- Research Initiative for Supra-Materials (RISM), Shinshu University, 4-17-1 Wakasato, Nagano 380-8553, Japan
- Office of University Professors, the University of Tokyo, 2-11-16 Yayoi, Bunkyo-ku, Tokyo 113-8656, Japan
| | - David A. Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, VIC 3052, Australia
- School of Biochemistry and Genetics, La Trobe University, Kingsbury Drive, 3042 Bundoora, Australia
- School of Pharmacy, University of Nottingham, NG7 2RD Nottingham, UK
| | - Rachel A. Caruso
- Applied Chemistry and Environmental Science, School of Science, STEM College, RMIT University, GPO Box 2476, Melbourne, VIC 3001, Australia
| |
Collapse
|
20
|
Machine learning-based approach for disease severity classification of carpal tunnel syndrome. Sci Rep 2021; 11:17464. [PMID: 34465860 PMCID: PMC8408248 DOI: 10.1038/s41598-021-97043-7] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 08/12/2021] [Indexed: 12/23/2022] Open
Abstract
Identifying the severity of carpal tunnel syndrome (CTS) is essential to providing appropriate therapeutic interventions. We developed and validated machine-learning (ML) models for classifying CTS severity. Here, 1037 CTS hands with 11 variables each were retrospectively analyzed. CTS was confirmed using electrodiagnosis, and its severity was classified into three grades: mild, moderate, and severe. The dataset was randomly split into a training (70%) and test (30%) set. A total of 507 mild, 276 moderate, and 254 severe CTS hands were included. Extreme gradient boosting (XGB) showed the highest external validation accuracy in the multi-class classification at 76.6% (95% confidence interval [CI] 71.2–81.5). XGB also had an optimal model training accuracy of 76.1%. Random forest (RF) and k-nearest neighbors had the second-highest external validation accuracy of 75.6% (95% CI 70.0–80.5). For the RF and XGB models, the numeric rating scale of pain was the most important variable, and body mass index was the second most important. The one-versus-rest classification yielded improved external validation accuracies for each severity grade compared with the multi-class classification (mild, 83.6%; moderate, 78.8%; severe, 90.9%). The CTS severity classification based on the ML model was validated and is readily applicable to aiding clinical evaluations.
Collapse
|
21
|
|