1
|
Ensemble machine learning prediction of anaerobic co-digestion of manure and thermally pretreated harvest residues. BIORESOURCE TECHNOLOGY 2024; 402:130793. [PMID: 38703965 DOI: 10.1016/j.biortech.2024.130793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 04/30/2024] [Accepted: 05/01/2024] [Indexed: 05/06/2024]
Abstract
This study aimed to clarify the statistical accuracy assessment approaches used in recent biogas prediction studies using state-of-the-art ensemble machine learning approach according to 10-fold cross-validation in 100 repetitions. Three thermally pretreated harvest residue types (maize stover, sunflower stalk and soybean straw) and manure were anaerobically co-digested, measuring biogas and methane yield alongside eight thermal preprocessing and biomass covariates. These were the inputs to an ensemble machine learning approach for biogas and methane yield prediction, employing three feature selection approaches. The Support Vector Machine prediction with the Recursive Feature Elimination resulted in the highest prediction accuracy, achieving the coefficient of determination of 0.820 and 0.823 for biogas and methane yield prediction, respectively. This study demonstrated an extreme dependency of prediction accuracy to input dataset properties, which could only be mitigated with ensemble machine learning and strongly suggested that the split-sample approach, often used in previous studies, should be avoided.
Collapse
|
2
|
Choice of the Right Supporting Electrolyte in Electrochemical Reductions: A Principal Component Analysis. J Am Chem Soc 2024. [PMID: 38785120 DOI: 10.1021/jacs.4c00910] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
We present an analysis of a set of molecular, electrical, and electronic properties for a large number of the cations of quaternary ammonium salts usually employed as supporting electrolytes in cathodic reduction reactions. The goal of the present study is to define a measure for the quality of a supporting electrolyte in terms of the yield of the reaction considered. We performed a principal component analysis using the normalized values of the properties in order to lower the number of relevant reaction coordinates and find that the integral variance of 13 properties can well be represented by three principal components. The yield of the electrochemical hydrodimerization of acrylonitrile employing different quaternary ammonium salts as supporting electrolytes was determined in a series of experiments. We found only a very weak correlation between the yield and the values of the properties but a strong correlation between the yield and the values of the most important principal component. Very similar results are obtained for two further existing systematic experimental studies of the impact of the supporting electrolyte on the yield of cathodic reductions. For all three example reactions, a supervised regression using the two most important principal components as variables yields excellent values for the coefficients of determination. For comparison, we also applied our methodology to sets of purely structure-based features that are usually employed in cheminformatics and obtained results of almost similar quality. We therefore conjecture that our methodology in combination with a small number of experiments can be used to predict the yield of a given cathodic reduction on the basis of the properties of the supporting electrolyte.
Collapse
|
3
|
Explainable cancer factors discovery: Shapley additive explanation for machine learning models demonstrates the best practices in the case of pancreatic cancer. Pancreatology 2024; 24:404-423. [PMID: 38342661 DOI: 10.1016/j.pan.2024.02.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 01/07/2024] [Accepted: 02/05/2024] [Indexed: 02/13/2024]
Abstract
Pancreatic cancer is one of digestive tract cancers with high mortality rate. Despite the wide range of available treatments and improvements in surgery, chemotherapy, and radiation therapy, the five-year prognosis for individuals diagnosed pancreatic cancer remains poor. There is still research to be done to see if immunotherapy may be used to treat pancreatic cancer. The goals of our research were to comprehend the tumor microenvironment of pancreatic cancer, found a useful biomarker to assess the prognosis of patients, and investigated its biological relevance. In this paper, machine learning methods such as random forest were fused with weighted gene co-expression networks for screening hub immune-related genes (hub-IRGs). LASSO regression model was used to further work. Thus, we got eight hub-IRGs. Based on hub-IRGs, we created a prognosis risk prediction model for PAAD that can stratify accurately and produce a prognostic risk score (IRG_Score) for each patient. In the raw data set and the validation data set, the five-year area under the curve (AUC) for this model was 0.9 and 0.7, respectively. And shapley additive explanation (SHAP) portrayed the importance of prognostic risk prediction influencing factors from a machine learning perspective to obtain the most influential certain gene (or clinical factor). The five most important factors were TRIM67, CORT, PSPN, SCAMP5, RFXAP, all of which are genes. In summary, the eight hub-IRGs had accurate risk prediction performance and biological significance, which was validated in other cancers. The result of SHAP helped to understand the molecular mechanism of pancreatic cancer.
Collapse
|
4
|
Potential therapeutic targets for COVID-19 complicated with pulmonary hypertension: a bioinformatics and early validation study. Sci Rep 2024; 14:9294. [PMID: 38653779 DOI: 10.1038/s41598-024-60113-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Accepted: 04/18/2024] [Indexed: 04/25/2024] Open
Abstract
Coronavirus disease (COVID-19) and pulmonary hypertension (PH) are closely correlated. However, the mechanism is still poorly understood. In this article, we analyzed the molecular action network driving the emergence of this event. Two datasets (GSE113439 and GSE147507) from the GEO database were used for the identification of differentially expressed genes (DEGs).Common DEGs were selected by VennDiagram and their enrichment in biological pathways was analyzed. Candidate gene biomarkers were selected using three different machine-learning algorithms (SVM-RFE, LASSO, RF).The diagnostic efficacy of these foundational genes was validated using independent datasets. Eventually, we validated molecular docking and medication prediction. We found 62 common DEGs, including several ones that could be enriched for Immune Response and Inflammation. Two DEGs (SELE and CCL20) could be identified by machine-learning algorithms. They performed well in diagnostic tests on independent datasets. In particular, we observed an upregulation of functions associated with the adaptive immune response, the leukocyte-lymphocyte-driven immunological response, and the proinflammatory response. Moreover, by ssGSEA, natural killer T cells, activated dendritic cells, activated CD4 T cells, neutrophils, and plasmacytoid dendritic cells were correlated with COVID-19 and PH, with SELE and CCL20 showing the strongest correlation with dendritic cells. Potential therapeutic compounds like FENRETI-NIDE, AFLATOXIN B1 and 1-nitropyrene were predicted. Further molecular docking and molecular dynamics simulations showed that 1-nitropyrene had the most stable binding with SELE and CCL20.The findings indicated that SELE and CCL20 were identified as novel diagnostic biomarkers for COVID-19 complicated with PH, and the target of these two key genes, FENRETI-NIDE and 1-nitropyrene, was predicted to be a potential therapeutic target, thus providing new insights into the prediction and treatment of COVID-19 complicated with PH in clinical practice.
Collapse
|
5
|
Novel insights into immune-related genes associated with type 2 diabetes mellitus-related cognitive impairment. World J Diabetes 2024; 15:735-757. [PMID: 38680704 PMCID: PMC11045412 DOI: 10.4239/wjd.v15.i4.735] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 01/21/2024] [Accepted: 03/04/2024] [Indexed: 04/11/2024] Open
Abstract
BACKGROUND The cognitive impairment in type 2 diabetes mellitus (T2DM) is a multifaceted and advancing state that requires further exploration to fully comprehend. Neuroinflammation is considered to be one of the main mechanisms and the immune system has played a vital role in the progression of the disease. AIM To identify and validate the immune-related genes in the hippocampus associated with T2DM-related cognitive impairment. METHODS To identify differentially expressed genes (DEGs) between T2DM and controls, we used data from the Gene Expression Omnibus database GSE125387. To identify T2DM module genes, we used Weighted Gene Co-Expression Network Analysis. All the genes were subject to Gene Set Enrichment Analysis. Protein-protein interaction network construction and machine learning were utilized to identify three hub genes. Immune cell infiltration analysis was performed. The three hub genes were validated in GSE152539 via receiver operating characteristic curve analysis. Validation experiments including reverse transcription quantitative real-time PCR, Western blotting and immunohistochemistry were conducted both in vivo and in vitro. To identify potential drugs associated with hub genes, we used the Comparative Toxicogenomics Database (CTD). RESULTS A total of 576 DEGs were identified using GSE125387. By taking the intersection of DEGs, T2DM module genes, and immune-related genes, a total of 59 genes associated with the immune system were identified. Afterward, machine learning was utilized to identify three hub genes (H2-T24, Rac3, and Tfrc). The hub genes were associated with a variety of immune cells. The three hub genes were validated in GSE152539. Validation experiments were conducted at the mRNA and protein levels both in vivo and in vitro, consistent with the bioinformatics analysis. Additionally, 11 potential drugs associated with RAC3 and TFRC were identified based on the CTD. CONCLUSION Immune-related genes that differ in expression in the hippocampus are closely linked to microglia. We validated the expression of three hub genes both in vivo and in vitro, consistent with our bioinformatics results. We discovered 11 compounds associated with RAC3 and TFRC. These findings suggest that they are co-regulatory molecules of immunometabolism in diabetic cognitive impairment.
Collapse
|
6
|
Cognitive and clinical predictors of a long-term course in obsessive compulsive disorder: A machine learning approach in a prospective cohort study. J Affect Disord 2024; 350:648-655. [PMID: 38246282 DOI: 10.1016/j.jad.2024.01.157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 12/20/2023] [Accepted: 01/14/2024] [Indexed: 01/23/2024]
Abstract
BACKGROUND Obsessive compulsive disorder (OCD) is a disabling illness with a chronic course, yet data on long-term outcomes are scarce. This study aimed to examine the long-term course of OCD in patients treated with different approaches (drugs, psychotherapy, and psychosurgery) and to identify predictors of clinical outcome by machine learning. METHOD We included outpatients with OCD treated at our referral unit. Demographic and neuropsychological data were collected at baseline using standardized instruments. Clinical data were collected at baseline, 12 weeks after starting pharmacological treatment prescribed at study inclusion, and after follow-up. RESULTS Of the 60 outpatients included, with follow-up data available for 5-17 years (mean = 10.6 years), 40 (67.7 %) were considered non-responders to adequate treatment at the end of the study. The best machine learning model achieved a correlation of 0.63 for predicting the long-term Yale-Brown Obsessive Compulsive Scale (Y-BOCS) score by adding clinical response (to the first pharmacological treatment) to the baseline clinical and neuropsychological characteristics. LIMITATIONS Our main limitations were the sample size, modest in the context of traditional ML studies, and the sample composition, more representative of rather severe OCD cases than of patients from the general community. CONCLUSIONS Many patients with OCD showed persistent and disabling symptoms at the end of follow-up despite comprehensive treatment that could include medication, psychotherapy, and psychosurgery. Machine learning algorithms can predict the long-term course of OCD using clinical and cognitive information to optimize treatment options.
Collapse
|
7
|
Identifying antinuclear antibody positive individuals at risk for developing systemic autoimmune disease: development and validation of a real-time risk model. Front Immunol 2024; 15:1384229. [PMID: 38571954 PMCID: PMC10987951 DOI: 10.3389/fimmu.2024.1384229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 03/08/2024] [Indexed: 04/05/2024] Open
Abstract
Objective Positive antinuclear antibodies (ANAs) cause diagnostic dilemmas for clinicians. Currently, no tools exist to help clinicians interpret the significance of a positive ANA in individuals without diagnosed autoimmune diseases. We developed and validated a risk model to predict risk of developing autoimmune disease in positive ANA individuals. Methods Using a de-identified electronic health record (EHR), we randomly chart reviewed 2,000 positive ANA individuals to determine if a systemic autoimmune disease was diagnosed by a rheumatologist. A priori, we considered demographics, billing codes for autoimmune disease-related symptoms, and laboratory values as variables for the risk model. We performed logistic regression and machine learning models using training and validation samples. Results We assembled training (n = 1030) and validation (n = 449) sets. Positive ANA individuals who were younger, female, had a higher titer ANA, higher platelet count, disease-specific autoantibodies, and more billing codes related to symptoms of autoimmune diseases were all more likely to develop autoimmune diseases. The most important variables included having a disease-specific autoantibody, number of billing codes for autoimmune disease-related symptoms, and platelet count. In the logistic regression model, AUC was 0.83 (95% CI 0.79-0.86) in the training set and 0.75 (95% CI 0.68-0.81) in the validation set. Conclusion We developed and validated a risk model that predicts risk for developing systemic autoimmune diseases and can be deployed easily within the EHR. The model can risk stratify positive ANA individuals to ensure high-risk individuals receive urgent rheumatology referrals while reassuring low-risk individuals and reducing unnecessary referrals.
Collapse
|
8
|
Influence factors of CO adsorption on C 2N-supported dual-atom catalysts unveiled by machine learning and twofold feature engineering. Phys Chem Chem Phys 2024; 26:9350-9355. [PMID: 38444345 DOI: 10.1039/d4cp00213j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2024]
Abstract
Dual-atom catalysts (DACs) have emerged as a compelling frontier in the realm of the electrochemical carbon dioxide reduction reaction (CO2RR). However, elucidating the intrinsic properties of dual-atom pairs and their direct correlation with catalytic activity poses significant challenges. Herein, we investigate CO adsorption on 248 kinds of C2N-supported DACs and analyze the underlying structure-activity relationships of dual transition metal (TM) atoms based on density functional theory (DFT) calculations and machine learning (ML) models. Compared to the direct input of atomic features in the decision tree model of ML, we confirm that extra feature engineering with the introduction of the arithmetic combination of atomic features can better reflect the correlation of dual TM atoms on C2N-based DACs. Further feature importance analysis reveals a strong relationship between the last one occupied orbital radius (rv), group number (G) for dual TM atoms and the CO binding strength, as well as a potential connection with the d band centre (εd). Our work provides deeper insights into the design of DACs and highlights the significance of twofold feature engineering for the synergistic effects between dual TM atoms.
Collapse
|
9
|
A deep learning method for empirical spectral prediction and inverse design of all-optical nonlinear plasmonic ring resonator switches. Sci Rep 2024; 14:5787. [PMID: 38461205 PMCID: PMC10924975 DOI: 10.1038/s41598-024-56522-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 03/07/2024] [Indexed: 03/11/2024] Open
Abstract
All-optical plasmonic switches (AOPSs) utilizing surface plasmon polaritons are well-suited for integration into photonic integrated circuits (PICs) and play a crucial role in advancing all-optical signal processing. The current AOPS design methods still rely on trial-and-error or empirical approaches. In contrast, recent deep learning (DL) advances have proven highly effective as computational tools, offering an alternative means to accelerate nanophotonics simulations. This paper proposes an innovative approach utilizing DL for spectrum prediction and inverse design of AOPS. The switches employ circular nonlinear plasmonic ring resonators (NPRRs) composed of interconnected metal-insulator-metal waveguides with a ring resonator. The NPRR switching performance is shown using the nonlinear Kerr effect. The forward model presented in this study demonstrates superior computational efficiency when compared to the finite-difference time-domain method. The model analyzes various structural parameters to predict transmission spectra with a distinctive dip. Inverse modeling enables the prediction of design parameters for desired transmission spectra. This model provides a rapid estimation of design parameters, offering a clear advantage over time-intensive conventional optimization approaches. The loss of prediction for both the forward and inverse models, when compared to simulations, is exceedingly low and on the order of 10-4. The results confirm the suitability of employing DL for forward and inverse design of AOPSs in PICs.
Collapse
|
10
|
Development and validation of a nomogram to predict hypothermia in adult burn patients during escharectomy under general anesthesia. Burns 2024; 50:93-105. [PMID: 37821272 DOI: 10.1016/j.burns.2023.06.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 04/13/2023] [Accepted: 06/12/2023] [Indexed: 10/13/2023]
Abstract
BACKGROUND It is very common for burn patients to have hypothermia during escharectomy under general anesthesia, which increases the blood transfusion demand of burn patients, and may lead to blood coagulation disorder or even increase the mortality of patients. It is important to predict the occurrence of hypothermia in advance, but we lack a prognostic prediction model. Our study aimed to develop a nomogram to predict the incidence of hypothermia in adult burn patients undergoing escharectomy under general anesthesia to intervention the hazards associated with hypothermia early. METHODS This retrospective study included 978 adult burn patients who underwent simple escharectomy under general anesthesia during hospitalization between January 2017 and December 2022, they were further divided into a training cohort and a validation cohort. The clinical data were recorded in electronic medical record system and a self-made collection table of intraoperative hypothermia. The preliminary predictive factors for hypothermia which undergoing simple escharectomy under general anesthesia in burn patients were determined using least absolute shrinkage and selection operator (LASSO) at first, then the final predictive factors determined using binary logistic regression analyses and a nomogram to predict the occurrence of hypothermia was established. The index of concordance(C-index), calibration curves, receiver operating characteristic (ROC) curve, and decision curve analysis (DCA) were used to evaluate the performance of the model. RESULTS A total of 211 patients with hypothermia and 767 patients without hypothermia were selected. Least absolute shrinkage and selection operator regression analysis and binary logistic regression results concluded that burn index, urinary volume, blood transfusion volume and irrigation volume were significantly associated with hypothermia in burn patients undergoing escharectomy under general anesthesia. The nomogram based on these four variables had good predictive efficiency for hypothermia in adult burn patients during escharectomy under general anesthesia, the C-index in the training cohort was 0.903, areas under the receiver operating characteristic curves (AUROC) of for the training cohort (95 % CI 0.877-0.920) and 0.875 for the validation cohort (95 % CI 0.852-0.897) indicated satisfactory discriminative ability of the nomogram, and the calibration curves for the training cohort and the validation cohort also fit as well, indicating that the nomogram had good clinical application value. CONCLUSIONS Hypothermia in burn patients during escharectomy under general anesthesia is associated with burn index, urinary volume, blood transfusion volume and irrigation volume. We successfully developed a practical nomogram to accurately predict hypothermia, which is a practical method helping clinicians rapidly and conveniently diagnose and guide the treatment of hypothermia in burn patients during escharectomy under general anesthesia.
Collapse
|
11
|
Analysis and validation of diagnostic biomarkers and immune cell infiltration characteristics in pediatric sepsis by integrating bioinformatics and machine learning. World J Pediatr 2023; 19:1094-1103. [PMID: 37115484 PMCID: PMC10533616 DOI: 10.1007/s12519-023-00717-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 03/10/2023] [Indexed: 04/29/2023]
Abstract
BACKGROUND Pediatric sepsis is a complicated condition characterized by life-threatening organ failure resulting from a dysregulated host response to infection in children. It is associated with high rates of morbidity and mortality, and rapid detection and administration of antimicrobials have been emphasized. The objective of this study was to evaluate the diagnostic biomarkers of pediatric sepsis and the function of immune cell infiltration in the development of this illness. METHODS Three gene expression datasets were available from the Gene Expression Omnibus collection. First, the differentially expressed genes (DEGs) were found with the use of the R program, and then gene set enrichment analysis was carried out. Subsequently, the DEGs were combined with the major module genes chosen using the weighted gene co-expression network. The hub genes were identified by the use of three machine-learning algorithms: random forest, support vector machine-recursive feature elimination, and least absolute shrinkage and selection operator. The receiver operating characteristic curve and nomogram model were used to verify the discrimination and efficacy of the hub genes. In addition, the inflammatory and immune status of pediatric sepsis was assessed using cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORT). The relationship between the diagnostic markers and infiltrating immune cells was further studied. RESULTS Overall, after overlapping key module genes and DEGs, we detected 402 overlapping genes. As pediatric sepsis diagnostic indicators, CYSTM1 (AUC = 0.988), MMP8 (AUC = 0.973), and CD177 (AUC = 0.986) were investigated and demonstrated statistically significant differences (P < 0.05) and diagnostic efficacy in the validation set. As indicated by the immune cell infiltration analysis, multiple immune cells may be involved in the development of pediatric sepsis. Additionally, all diagnostic characteristics may correlate with immune cells to varying degrees. CONCLUSIONS The candidate hub genes (CD177, CYSTM1, and MMP8) were identified, and the nomogram was constructed for pediatric sepsis diagnosis. Our study could provide potential peripheral blood diagnostic candidate genes for pediatric sepsis patients.
Collapse
|
12
|
Groundwater quality modeling and determining critical points: a comparison of machine learning to Best-Worst Method. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:115758-115775. [PMID: 37889408 DOI: 10.1007/s11356-023-30530-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 10/13/2023] [Indexed: 10/28/2023]
Abstract
In Iran, similar to other developing countries, groundwater quality has been seriously threatened. Therefore, this study aimed to apply Machine Learning Algorithms (MLAs) in Groundwater Quality Modeling (GQM) and determine the optimal algorithm using the Best-Worst Method (BWM) in Ardabil province, Iran. Groundwater quality parameters included calcium (Ca2+), magnesium (Mg2+), sodium (Na+), potassium (K+), chlorine (Cl-), sulfate (SO4-), total dissolved solids (TDS), bicarbonate (HCO3-), electrical conductivity (EC), and acidity (pH). In the following, seven MLAs, including Support Vector Regression (SVR), Random Forest (RF), Decision Tree Regressor (DTR), K-Nearest Neighbor (KNN), Naïve Bayes, Simple Linear Regression (SLR), and Support Vector Machine (SVM), were used in the Python programming language, and groundwater quality was modeled. Finally, BWM was used to validate the results of MLAs. The results of examining the error statistics in determining the optimal algorithm in groundwater quality modeling showed that the RF algorithm with values of MAE = 0.28, MSE = 0.12, RMSE = 0.35, and AUC = 0.93 was selected as the most optimal MLA. The Schoeller diagram also showed that various ion ratios, including Na+K, Ca2+, Mg2+, Cl-, and HCO3+CO3, in most of the sampled points had upward average values. Based on the results of the BWM method, it can be concluded that a great similarity was observed between the results of the RF algorithm and the classification of the BWM method. These results showed that more than 50% of the studied area had low quality based on hydro-chemical parameters of groundwater quality. The findings of this research can assist managers and planners in developing suitable management models and implementing appropriate strategies for the optimal exploitation of groundwater resources.
Collapse
|
13
|
Application of Machine Learning in Material Synthesis and Property Prediction. MATERIALS (BASEL, SWITZERLAND) 2023; 16:5977. [PMID: 37687675 PMCID: PMC10488794 DOI: 10.3390/ma16175977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Revised: 08/22/2023] [Accepted: 08/28/2023] [Indexed: 09/10/2023]
Abstract
Material innovation plays a very important role in technological progress and industrial development. Traditional experimental exploration and numerical simulation often require considerable time and resources. A new approach is urgently needed to accelerate the discovery and exploration of new materials. Machine learning can greatly reduce computational costs, shorten the development cycle, and improve computational accuracy. It has become one of the most promising research approaches in the process of novel material screening and material property prediction. In recent years, machine learning has been widely used in many fields of research, such as superconductivity, thermoelectrics, photovoltaics, catalysis, and high-entropy alloys. In this review, the basic principles of machine learning are briefly outlined. Several commonly used algorithms in machine learning models and their primary applications are then introduced. The research progress of machine learning in predicting material properties and guiding material synthesis is discussed. Finally, a future outlook on machine learning in the materials science field is presented.
Collapse
|
14
|
Machine learning-assisted data filtering and QSAR models for prediction of chemical acute toxicity on rat and mouse. JOURNAL OF HAZARDOUS MATERIALS 2023; 452:131344. [PMID: 37027914 DOI: 10.1016/j.jhazmat.2023.131344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 03/20/2023] [Accepted: 03/31/2023] [Indexed: 05/03/2023]
Abstract
Machine learning (ML) methods provide a new opportunity to build quantitative structure-activity relationship (QSAR) models for predicting chemicals' toxicity based on large toxicity data sets, but they are limited in insufficient model robustness due to poor data set quality for chemicals with certain structures. To address this issue and improve model robustness, we built a large data set on rat oral acute toxicity for thousands of chemicals, then used ML to filter chemicals favorable for regression models (CFRM). In comparison to chemicals not favorable for regression models (CNRM), CFRM accounted for 67% of chemicals in the original data set, and had a higher structural similarity and a smaller toxicity distribution in 2-4 log10 (mg/kg). The performance of established regression models for CFRM was greatly improved, with root-mean-square deviations (RMSE) in the range of 0.45-0.48 log10 (mg/kg). Classification models were built for CNRM using all chemicals in the original data set, and the area under receiver operating characteristic (AUROC) reached 0.75-0.76. The proposed strategy was successfully applied to a mouse oral acute data set, yielding RMSE and AUROC in the range of 0.36-0.38 log10 (mg/kg) and 0.79, respectively.
Collapse
|
15
|
Artificial intelligence-assisted selection and efficacy prediction of antineoplastic strategies for precision cancer therapy. Semin Cancer Biol 2023; 90:57-72. [PMID: 36796530 DOI: 10.1016/j.semcancer.2023.02.005] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/12/2023] [Accepted: 02/13/2023] [Indexed: 02/16/2023]
Abstract
The rapid development of artificial intelligence (AI) technologies in the context of the vast amount of collectable data obtained from high-throughput sequencing has led to an unprecedented understanding of cancer and accelerated the advent of a new era of clinical oncology with a tone of precision treatment and personalized medicine. However, the gains achieved by a variety of AI models in clinical oncology practice are far from what one would expect, and in particular, there are still many uncertainties in the selection of clinical treatment options that pose significant challenges to the application of AI in clinical oncology. In this review, we summarize emerging approaches, relevant datasets and open-source software of AI and show how to integrate them to address problems from clinical oncology and cancer research. We focus on the principles and procedures for identifying different antitumor strategies with the assistance of AI, including targeted cancer therapy, conventional cancer therapy, and cancer immunotherapy. In addition, we also highlight the current challenges and directions of AI in clinical oncology translation. Overall, we hope this article will provide researchers and clinicians with a deeper understanding of the role and implications of AI in precision cancer therapy, and help AI move more quickly into accepted cancer guidelines.
Collapse
|
16
|
Identification and validation of cuproptosis related genes and signature markers in bronchopulmonary dysplasia disease using bioinformatics analysis and machine learning. BMC Med Inform Decis Mak 2023; 23:69. [PMID: 37060021 PMCID: PMC10105406 DOI: 10.1186/s12911-023-02163-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 03/31/2023] [Indexed: 04/16/2023] Open
Abstract
BACKGROUND Bronchopulmonary Dysplasia (BPD) has a high incidence and affects the health of preterm infants. Cuproptosis is a novel form of cell death, but its mechanism of action in the disease is not yet clear. Machine learning, the latest tool for the analysis of biological samples, is still relatively rarely used for in-depth analysis and prediction of diseases. METHODS AND RESULTS First, the differential expression of cuproptosis-related genes (CRGs) in the GSE108754 dataset was extracted and the heat map showed that the expression of NFE2L2 gene was significantly higher in the control group whereas the expression of GLS gene was significantly higher in the treatment group. Chromosome location analysis showed that both the genes were positively correlated and associated with chromosome 2. The results of immune infiltration and immune cell differential analysis showed differences in the four immune cells, significantly in Monocytes cells. Five new pathways were analyzed through two subgroups based on consistent clustering of CRG expression. Weighted correlation network analysis (WGCNA) set the screening condition to the top 25% to obtain the disease signature genes. Four machine learning algorithms: Generalized Linear Models (GLM), Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGB) were used to screen the disease signature genes, and the final five marker genes for disease prediction. The models constructed by GLM method were proved to be more accurate in the validation of two datasets, GSE190215 and GSE188944. CONCLUSION We eventually identified two copper death-associated genes, NFE2L2 and GLS. A machine learning model-GLM was constructed to predict the prevalence of BPD disease, and five disease signature genes NFATC3, ERMN, PLA2G4A, MTMR9LP and LOC440700 were identified. These genes that were bioinformatics analyzed could be potential targets for identifying BPD disease and treatment.
Collapse
|
17
|
A predictive model for early clinical diagnosis of spinal tuberculosis based on conventional laboratory indices: A multicenter real-world study. Front Cell Infect Microbiol 2023; 13:1150632. [PMID: 37033479 PMCID: PMC10080113 DOI: 10.3389/fcimb.2023.1150632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 03/14/2023] [Indexed: 04/11/2023] Open
Abstract
Background Early diagnosis of spinal tuberculosis (STB) remains challenging. The aim of this study was to develop a predictive model for the early diagnosis of STB based on conventional laboratory indicators. Method The clinical data of patients with suspected STB in four hospitals were included, and variables were screened by Lasso regression. Eighty-five percent of the cases in the dataset were randomly selected as the training set, and the other 15% were selected as the validation set. The diagnostic prediction model was established by logistic regression in the training set, and the nomogram was drawn. The diagnostic performance of the model was verified in the validation set. Result A total of 206 patients were included in the study, including 105 patients with STB and 101 patients with NSTB. Twelve variables were screened by Lasso regression and modeled by logistic regression, and seven variables (TB.antibody, IGRAs, RBC, Mono%, RDW, AST, BUN) were finally included in the model. AUC of 0.9468 and 0.9188 in the training and validation cohort, respectively. Conclusion In this study, we developed a prediction model for the early diagnosis of STB which consisted of seven routine laboratory indicators.
Collapse
|
18
|
Continuous cuffless and non-invasive measurement of arterial blood pressure—concepts and future perspectives. Blood Press 2022; 31:254-269. [DOI: 10.1080/08037051.2022.2128716] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
19
|
Identification of Biomarkers for Methamphetamine Exposure Time Prediction in Mice Using Metabolomics and Machine Learning Approaches. Metabolites 2022; 12:metabo12121250. [PMID: 36557288 PMCID: PMC9780981 DOI: 10.3390/metabo12121250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 12/04/2022] [Accepted: 12/07/2022] [Indexed: 12/14/2022] Open
Abstract
Methamphetamine (METH) abuse has become a global public health and safety problem. More information is needed to identify the time of drug abuse. In this study, methamphetamine was administered to male C57BL/6J mice with increasing doses from 5 to 30 mg kg-1 (once a day, i.p.) for 20 days. Serum and urine samples were collected for metabolomics studies using gas chromatography-mass spectrometry (GC-MS). Six machine learning models were used to infer the time of drug abuse and the best model was selected to predict administration time preliminarily. The metabolic changes caused by methamphetamine were explored. As results, the metabolic patterns of methamphetamine exposure mice were quite different from the control group and changed over time. Specifically, serum metabolomics showed enhanced amino acid metabolism and increased fatty acid consumption, while urine metabolomics showed slowed metabolism of the tricarboxylic acid (TCA) cycle, increased organic acid excretion, and abnormal purine metabolism. Phenylalanine in serum and glutamine in urine increased, while palmitic acid, 5-HT, and monopalmitin in serum and gamma-aminobutyric acid in urine decreased significantly. Among the six machine learning models, the random forest model was the best to predict the exposure time (serum: MAE = 1.482, RMSE = 1.69, R squared = 0.981; urine: MAE = 2.369, RMSE = 1.926, R squared = 0.946). The potential biomarker set containing four metabolites in the serum (palmitic acid, 5-hydroxytryptamine, monopalmitin, and phenylalanine) facilitated the identification of methamphetamine exposure. The random forest model helped predict the methamphetamine exposure time based on these potential biomarkers.
Collapse
|
20
|
Machine learning for predicting elections in Latin America based on social media engagement and polls. GOVERNMENT INFORMATION QUARTERLY 2022. [DOI: 10.1016/j.giq.2022.101782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
21
|
Effective Approaches to Fetal Brain Segmentation in MRI and Gestational Age Estimation by Utilizing a Multiview Deep Inception Residual Network and Radiomics. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1708. [PMID: 36554113 PMCID: PMC9778347 DOI: 10.3390/e24121708] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 11/16/2022] [Accepted: 11/18/2022] [Indexed: 06/17/2023]
Abstract
To completely comprehend neurodevelopment in healthy and congenitally abnormal fetuses, quantitative analysis of the human fetal brain is essential. This analysis requires the use of automatic multi-tissue fetal brain segmentation techniques. This paper proposes an end-to-end automatic yet effective method for a multi-tissue fetal brain segmentation model called IRMMNET. It includes a inception residual encoder block (EB) and a dense spatial attention (DSAM) block, which facilitate the extraction of multi-scale fetal-brain-tissue-relevant information from multi-view MRI images, enhance the feature reuse, and substantially reduce the number of parameters of the segmentation model. Additionally, we propose three methods for predicting gestational age (GA)-GA prediction by using a 3D autoencoder, GA prediction using radiomics features, and GA prediction using the IRMMNET segmentation model's encoder. Our experiments were performed on a dataset of 80 pathological and non-pathological magnetic resonance fetal brain volume reconstructions across a range of gestational ages (20 to 33 weeks) that were manually segmented into seven different tissue categories. The results showed that the proposed fetal brain segmentation model achieved a Dice score of 0.791±0.18, outperforming the state-of-the-art methods. The radiomics-based GA prediction methods achieved the best results (RMSE: 1.42). We also demonstrated the generalization capabilities of the proposed methods for tasks such as head and neck tumor segmentation and the prediction of patients' survival days.
Collapse
|
22
|
A comprehensive survey on computational learning methods for analysis of gene expression data. Front Mol Biosci 2022; 9:907150. [PMID: 36458095 PMCID: PMC9706412 DOI: 10.3389/fmolb.2022.907150] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 09/28/2022] [Indexed: 09/19/2023] Open
Abstract
Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. High-throughput gene expression analysis methods such as microarray technology and RNA sequencing produce enormous amounts of data. Traditionally, statistical methods are used for comparative analysis of gene expression data. However, more complex analysis for classification of sample observations, or discovery of feature genes requires sophisticated computational approaches. In this review, we compile various statistical and computational tools used in analysis of expression microarray data. Even though the methods are discussed in the context of expression microarrays, they can also be applied for the analysis of RNA sequencing and quantitative proteomics datasets. We discuss the types of missing values, and the methods and approaches usually employed in their imputation. We also discuss methods of data normalization, feature selection, and feature extraction. Lastly, methods of classification and class discovery along with their evaluation parameters are described in detail. We believe that this detailed review will help the users to select appropriate methods for preprocessing and analysis of their data based on the expected outcome.
Collapse
|
23
|
A hierarchical estimation of multi-modal distribution programming for regression problems. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.110129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
24
|
The use of predictive models to develop chromatography-based purification processes. Front Bioeng Biotechnol 2022; 10:1009102. [PMID: 36312533 PMCID: PMC9605695 DOI: 10.3389/fbioe.2022.1009102] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/23/2022] [Indexed: 11/13/2022] Open
Abstract
Chromatography is the workhorse of biopharmaceutical downstream processing because it can selectively enrich a target product while removing impurities from complex feed streams. This is achieved by exploiting differences in molecular properties, such as size, charge and hydrophobicity (alone or in different combinations). Accordingly, many parameters must be tested during process development in order to maximize product purity and recovery, including resin and ligand types, conductivity, pH, gradient profiles, and the sequence of separation operations. The number of possible experimental conditions quickly becomes unmanageable. Although the range of suitable conditions can be narrowed based on experience, the time and cost of the work remain high even when using high-throughput laboratory automation. In contrast, chromatography modeling using inexpensive, parallelized computer hardware can provide expert knowledge, predicting conditions that achieve high purity and efficient recovery. The prediction of suitable conditions in silico reduces the number of empirical tests required and provides in-depth process understanding, which is recommended by regulatory authorities. In this article, we discuss the benefits and specific challenges of chromatography modeling. We describe the experimental characterization of chromatography devices and settings prior to modeling, such as the determination of column porosity. We also consider the challenges that must be overcome when models are set up and calibrated, including the cross-validation and verification of data-driven and hybrid (combined data-driven and mechanistic) models. This review will therefore support researchers intending to establish a chromatography modeling workflow in their laboratory.
Collapse
|
25
|
Neural network for multi-exponential sound energy decay analysis. THE JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA 2022; 152:942. [PMID: 36050155 DOI: 10.1121/10.0013416] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 07/18/2022] [Indexed: 06/15/2023]
Abstract
An established model for sound energy decay functions (EDFs) is the superposition of multiple exponentials and a noise term. This work proposes a neural-network-based approach for estimating the model parameters from EDFs. The network is trained on synthetic EDFs and evaluated on two large datasets of over 20 000 EDF measurements conducted in various acoustic environments. The evaluation shows that the proposed neural network architecture robustly estimates the model parameters from large datasets of measured EDFs while being lightweight and computationally efficient. An implementation of the proposed neural network is publicly available.
Collapse
|
26
|
Potential association factors for developing effective peptide-based cancer vaccines. Front Immunol 2022; 13:931612. [PMID: 35967400 PMCID: PMC9364268 DOI: 10.3389/fimmu.2022.931612] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2022] [Accepted: 06/29/2022] [Indexed: 11/26/2022] Open
Abstract
Peptide-based cancer vaccines have been shown to boost immune systems to kill tumor cells in cancer patients. However, designing an effective T cell epitope peptide-based cancer vaccine still remains a challenge and is a major hurdle for the application of cancer vaccines. In this study, we constructed for the first time a library of peptide-based cancer vaccines and their clinical attributes, named CancerVaccine (https://peptidecancervaccine.weebly.com/). To investigate the association factors that influence the effectiveness of cancer vaccines, these peptide-based cancer vaccines were classified into high (HCR) and low (LCR) clinical responses based on their clinical efficacy. Our study highlights that modified peptides derived from artificially modified proteins are suitable as cancer vaccines, especially for melanoma. It may be possible to advance cancer vaccines by screening for HLA class II affinity peptides may be an effective therapeutic strategy. In addition, the treatment regimen has the potential to influence the clinical response of a cancer vaccine, and Montanide ISA-51 might be an effective adjuvant. Finally, we constructed a high sensitivity and specificity machine learning model to assist in designing peptide-based cancer vaccines capable of providing high clinical responses. Together, our findings illustrate that a high clinical response following peptide-based cancer vaccination is correlated with the right type of peptide, the appropriate adjuvant, and a matched HLA allele, as well as an appropriate treatment regimen. This study would allow for enhanced development of cancer vaccines.
Collapse
|
27
|
A Machine Learning Approach in Autism Spectrum Disorders: From Sensory Processing to Behavior Problems. Front Mol Neurosci 2022; 15:889641. [PMID: 35615066 PMCID: PMC9126208 DOI: 10.3389/fnmol.2022.889641] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 04/06/2022] [Indexed: 12/01/2022] Open
Abstract
Atypical sensory processing described in autism spectrum disorders (ASDs) frequently cascade into behavioral alterations: isolation, aggression, indifference, anxious/depressed states, or attention problems. Predictive machine learning models might refine the statistical explorations of the associations between them by finding out how these dimensions are related. This study investigates whether behavior problems can be predicted using sensory processing abilities. Participants were 72 children and adolescents (21 females) diagnosed with ASD, aged between 6 and 14 years (M = 7.83 years; SD = 2.80 years). Parents of the participants were invited to answer the Sensory Profile 2 (SP2) and the Child Behavior Checklist (CBCL) questionnaires. A collection of 26 supervised machine learning regression models of different families was developed to predict the CBCL outcomes using the SP2 scores. The most reliable predictions were for the following outcomes: total problems (using the items in the SP2 touch scale as inputs), anxiety/depression (using avoiding quadrant), social problems (registration), and externalizing scales, revealing interesting relations between CBCL outcomes and SP2 scales. The prediction reliability on the remaining outcomes was “moderate to good” except somatic complaints and rule-breaking, where it was “bad to moderate.” Linear and ridge regression achieved the best prediction for a single outcome and globally, respectively, and gradient boosting machine achieved the best prediction in three outcomes. Results highlight the utility of several machine learning models in studying the predictive value of sensory processing impairments (with an early onset) on specific behavior alterations, providing evidences of relationship between sensory processing impairments and behavior problems in ASD.
Collapse
|
28
|
An Ensemble Learning Based Classification Approach for the Prediction of Household Solid Waste Generation. SENSORS (BASEL, SWITZERLAND) 2022; 22:3506. [PMID: 35591195 PMCID: PMC9104882 DOI: 10.3390/s22093506] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 04/20/2022] [Accepted: 04/26/2022] [Indexed: 05/07/2023]
Abstract
With the increase in urbanization and smart cities initiatives, the management of waste generation has become a fundamental task. Recent studies have started applying machine learning techniques to prognosticate solid waste generation to assist authorities in the efficient planning of waste management processes, including collection, sorting, disposal, and recycling. However, identifying the best machine learning model to predict solid waste generation is a challenging endeavor, especially in view of the limited datasets and lack of important predictive features. In this research, we developed an ensemble learning technique that combines the advantages of (1) a hyperparameter optimization and (2) a meta regressor model to accurately predict the weekly waste generation of households within urban cities. The hyperparameter optimization of the models is achieved using the Optuna algorithm, while the outputs of the optimized single machine learning models are used to train the meta linear regressor. The ensemble model consists of an optimized mixture of machine learning models with different learning strategies. The proposed ensemble method achieved an R2 score of 0.8 and a mean percentage error of 0.26, outperforming the existing state-of-the-art approaches, including SARIMA, NARX, LightGBM, KNN, SVR, ETS, RF, XGBoosting, and ANN, in predicting future waste generation. Not only did our model outperform the optimized single machine learning models, but it also surpassed the average ensemble results of the machine learning models. Our findings suggest that using the proposed ensemble learning technique, even in the case of a feature-limited dataset, can significantly boost the model performance in predicting future household waste generation compared to individual learners. Moreover, the practical implications for the research community and respective city authorities are discussed.
Collapse
|
29
|
Multisite and Multitemporal Grassland Yield Estimation Using UAV-Borne Hyperspectral Data. REMOTE SENSING 2022. [DOI: 10.3390/rs14092068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Grassland ecosystems can be hotspots of biodiversity and act as carbon sinks while at the same time providing the basis of forage production for ruminants in dairy and meat production. Annual grassland dry matter yield (DMY) is one of the most important agronomic parameters reflecting differences in usage intensity such as number of harvests and fertilization. Current methods for grassland DMY estimation are labor-intensive and prone to error due to small sample size. With the advent of unmanned aerial vehicles (UAVs) and miniaturized hyperspectral sensors, a novel tool for remote sensing of grassland with high spatial, temporal and radiometric resolution and coverage is available. The present study aimed at developing a robust model capable of estimating grassland biomass across a gradient of usage intensity throughout one growing season. Therefore, UAV-borne hyperspectral data from eight grassland sites in North Hesse, Germany, originating from different harvests, were utilized for the modeling of fresh matter yield (FMY) and DMY. Four machine learning (ML) algorithms were compared for their modeling performance. Among them, the rule-based ML method Cubist regression (CBR) performed best, delivering high prediction accuracies for both FMY (nRMSEp 7.6%, Rp2 0.87) and DMY (nRMSEp 12.9%, Rp2 0.75). The model showed a high robustness across sites and harvest dates. The best models were employed to produce maps for FMY and DMY, enabling the detailed analysis of spatial patterns. Although the complexity of the approach still restricts its practical application in agricultural management, the current study proved that biomass of grassland sites being subject to different management intensities can be modeled from UAV-borne hyperspectral data at high spatial resolution with high prediction accuracies.
Collapse
|
30
|
Optimization of Heterogeneous Catalyst-assisted Fatty Acid Methyl Esters Biodiesel Production from Soybean Oil with Different Machine Learning Methods. ARAB J CHEM 2022. [DOI: 10.1016/j.arabjc.2022.103915] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|
31
|
Comparison of logP and logD correction models trained with public and proprietary data sets. J Comput Aided Mol Des 2022; 36:253-262. [PMID: 35359246 DOI: 10.1007/s10822-022-00450-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Accepted: 03/15/2022] [Indexed: 10/18/2022]
Abstract
In drug discovery, partition and distribution coefficients, logP and logD for octanol/water, are widely used as metrics of the lipophilicity of molecules, which in turn have a strong influence on the bioactivity and bioavailability of potential drugs. There are a variety of established methods, mostly fragment or atom-based, to calculate logP while logD prediction generally relies on calculated logP and pKa for the estimation of neutral and ionized populations at a given pH. Algorithms such as ClogP have limitations generally leading to systematic errors for chemically related molecules while pKa estimation is generally more difficult due to the interplay of electronic, inductive and conjugation effects for ionizable moieties. We propose an integrated machine learning QSAR modeling approach to predict logD by training the model with experimental data while using ClogP and pKa predicted by commercial software as model descriptors. By optimizing the loss function for the ClogD calculated by the software, we build a correction model that incorporates both descriptors from the software and available experimental logD data. Additionally, we calculate logP from the logD model using the software predicted pKa's. Here, we have trained models using publicly or commercial available logD data to show that this approach can improve on commercial software predictions of lipophilicity. When applied to other logD data sets, this approach extends the domain of applicability of logD and logP predictions over commercial software. Performance of these models favorably compare with models built with a larger set of proprietary logD data.
Collapse
|
32
|
Development and validation of a model to predict rebleeding within three days after endoscopic hemostasis for high-risk peptic ulcer bleeding. BMC Gastroenterol 2022; 22:64. [PMID: 35164682 PMCID: PMC8843020 DOI: 10.1186/s12876-022-02145-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 01/31/2022] [Indexed: 11/17/2022] Open
Abstract
Background Peptic ulcer bleeding remains a typical medical emergency with significant morbidity and mortality. Peptic ulcer rebleeding often occurs within three days after emergent endoscopic hemostasis. Our study aims to develop a nomogram to predict rebleeding within three days after emergent endoscopic hemostasis for high-risk peptic ulcer bleeding. Methods We retrospectively reviewed the data of 386 patients with bleeding ulcers and high-risk stigmata who underwent emergent endoscopic hemostasis between March 2014 and October 2018. The least absolute shrinkage and selection operator method was used to identify predictors. The model was displayed as a nomogram. Internal validation was carried out using bootstrapping. The model was evaluated using the calibration plot, decision-curve analyses, and clinical impact curve. Results Overall, 386 patients meeting the inclusion criteria were enrolled, with 48 patients developed rebleeding within three days after initial endoscopic hemostasis. Predictors contained in the nomogram included albumin, prothrombin time, shock, haematemesis/melena and Forrest classification. The model showed good discrimination and good calibration with a C-index of 0.854 (C-index: 0.830 via bootstrapping validation). Decision-curve analyses and clinical impact curve also demonstrated that it was clinically valuable. Conclusion This study presents a nomogram that incorporates clinical, laboratory, and endoscopic features, effectively predicting rebleeding within three days after emergent endoscopic hemostasis and identifying high-risk rebleeding patients with peptic ulcer bleeding. Trial registration This clinical trial has been registered in the ClinicalTrials.gov (ID: NCT04895904) approved by the International Committee of Medical Journal Editors (ICMJE). Supplementary Information The online version contains supplementary material available at 10.1186/s12876-022-02145-9.
Collapse
|
33
|
Viability Study of Machine Learning-Based Prediction of COVID-19 Pandemic Impact in Obsessive-Compulsive Disorder Patients. Front Neuroinform 2022; 16:807584. [PMID: 35221957 PMCID: PMC8866769 DOI: 10.3389/fninf.2022.807584] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Accepted: 01/10/2022] [Indexed: 11/22/2022] Open
Abstract
Background Machine learning modeling can provide valuable support in different areas of mental health, because it enables to make rapid predictions and therefore support the decision making, based on valuable data. However, few studies have applied this method to predict symptoms’ worsening, based on sociodemographic, contextual, and clinical data. Thus, we applied machine learning techniques to identify predictors of symptomatologic changes in a Spanish cohort of OCD patients during the initial phase of the COVID-19 pandemic. Methods 127 OCD patients were assessed using the Yale–Brown Obsessive-Compulsive Scale (Y-BOCS) and a structured clinical interview during the COVID-19 pandemic. Machine learning models for classification (LDA and SVM) and regression (linear regression and SVR) were constructed to predict each symptom based on patient’s sociodemographic, clinical and contextual information. Results A Y-BOCS score prediction model was generated with 100% reliability at a score threshold of ± 6. Reliability of 100% was reached for obsessions and/or compulsions related to COVID-19. Symptoms of anxiety and depression were predicted with less reliability (correlation R of 0.58 and 0.68, respectively). The suicidal thoughts are predicted with a sensitivity of 79% and specificity of 88%. The best results are achieved by SVM and SVR. Conclusion Our findings reveal that sociodemographic and clinical data can be used to predict changes in OCD symptomatology. Machine learning may be valuable tool for helping clinicians to rapidly identify patients at higher risk and therefore provide optimized care, especially in future pandemics. However, further validation of these models is required to ensure greater reliability of the algorithms for clinical implementation to specific objectives of interest.
Collapse
|
34
|
BAHD1 serves as a critical regulator of breast cancer cell proliferation and invasion. Breast Cancer 2022; 29:516-530. [DOI: 10.1007/s12282-022-01333-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 01/05/2022] [Indexed: 01/06/2023]
|
35
|
Short-Term Energy Forecasting Using Machine-Learning-Based Ensemble Voting Regression. Symmetry (Basel) 2022. [DOI: 10.3390/sym14010160] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Meeting the required amount of energy between supply and demand is indispensable for energy manufacturers. Accordingly, electric industries have paid attention to short-term energy forecasting to assist their management system. This paper firstly compares multiple machine learning (ML) regressors during the training process. Five best ML algorithms, such as extra trees regressor (ETR), random forest regressor (RFR), light gradient boosting machine (LGBM), gradient boosting regressor (GBR), and K neighbors regressor (KNN) are trained to build our proposed voting regressor (VR) model. Final predictions are performed using the proposed ensemble VR and compared with five selected ML benchmark models. Statistical autoregressive moving average (ARIMA) is also compared with the proposed model to reveal results. For the experiments, usage energy and weather data are gathered from four regions of Jeju Island. Error measurements, including mean absolute percentage error (MAPE), mean absolute error (MAE), and mean squared error (MSE) are computed to evaluate the forecasting performance. Our proposed model outperforms six baseline models in terms of the result comparison, giving a minimum MAPE of 0.845% on the whole test set. This improved performance shows that our approach is promising for symmetrical forecasting using time series energy data in the power system sector.
Collapse
|
36
|
Forecasting Amazon Rain-Forest Deforestation Using a Hybrid Machine Learning Model. SUSTAINABILITY 2022. [DOI: 10.3390/su14020691] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The present work aims to carry out an analysis of the Amazon rain-forest deforestation, which can be analyzed from actual data and predicted by means of artificial intelligence algorithms. A hybrid machine learning model was implemented, using a dataset consisting of 760 Brazilian Amazon municipalities, with static data, namely geographical, forest, and watershed, among others, together with a time series data of annual deforestation area for the last 20 years (1999–2019). The designed learning model combines dense neural networks for the static variables and a recurrent Long Short Term Memory neural network for the temporal data. Many iterations were performed on augmented data, testing different configurations of the regression model, for adjusting the model hyper-parameters, and generating a battery of tests to obtain the optimal model, achieving a R-squared score of 87.82%. The final regression model predicts the increase in annual deforestation area (square kilometers), for a decade, from 2020 to 2030, predicting that deforestation will reach 1 million square kilometers by 2030, accounting for around 15% compared with the present 1%, of the between 5.5 and 6.7 millions of square kilometers of the rain-forest. The obtained results will help to understand the impact of man’s footprint on the Amazon rain-forest.
Collapse
|
37
|
Packet Loss Measurement Based on Sampled Flow. Symmetry (Basel) 2021. [DOI: 10.3390/sym13112149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
This paper is devoted to further strengthening, in the current asymmetric information environment, the informed level of operators about network performance. Specifically, in view of the burst and perishability of a packet loss event, to better meet the real-time requirements of current high-speed backbone performance monitoring, a model for Packet Loss Measurement at the access network boundary Based on Sampled Flow (PLMBSF) is presented in this paper under the premise of both cost and real-time. The model overcomes problems such as the inability of previous estimation to distinguish between packet losses before and after the monitoring point, deployment difficulties and cooperative operation consistency. Drawing support from the Mathis equation and regression analysis, the measurement for packet losses before and after the monitoring point can be realized when using only the sampled flows generated by the access network boundary equipment. The comparison results with the trace-based passive packet loss measurement show that although the proposed model is easily affected by factors such as flow length, loss rate, sampling rate, the overall accuracy is still within the acceptable range. In addition, the proposed model PLMBSF, compared with the trace-based loss measurement is only different in the input data granularity. Therefore, PLMBSF and its advantages are also applicable to aggregated traffic.
Collapse
|
38
|
Importance of Spatial Autocorrelation in Machine Learning Modeling of Polymetallic Nodules, Model Uncertainty and Transferability at Local Scale. MINERALS 2021. [DOI: 10.3390/min11111172] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Machine learning spatial modeling is used for mapping the distribution of deep-sea polymetallic nodules (PMN). However, the presence and influence of spatial autocorrelation (SAC) have not been extensively studied. SAC can provide information regarding the variable selection before modeling, and it results in erroneous validation performance when ignored. ML models are also problematic when applied in areas far away from the initial training locations, especially if the (new) area to be predicted covers another feature space. Here, we study the spatial distribution of PMN in a geomorphologically heterogeneous area of the Peru Basin, where SAC of PMN exists. The local Moran’s I analysis showed that there are areas with a significantly higher or lower number of PMN, associated with different backscatter values, aspect orientation, and seafloor geomorphological characteristics. A quantile regression forests (QRF) model is used using three cross-validation (CV) techniques (random-, spatial-, and cluster-blocking). We used the recently proposed “Area of Applicability” method to quantify the geographical areas where feature space extrapolation occurs. The results show that QRF predicts well in morphologically similar areas, with spatial block cross-validation being the least unbiased method. Conversely, random-CV overestimates the prediction performance. Under new conditions, the model transferability is reduced even on local scales, highlighting the need for spatial model-based dissimilarity analysis and transferability assessment in new areas.
Collapse
|
39
|
Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults. J Gerontol A Biol Sci Med Sci 2021; 76:647-654. [PMID: 32498077 DOI: 10.1093/gerona/glaa138] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty in understanding the complex algorithms that underlie models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. METHOD We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. RESULTS Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated using data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). CONCLUSIONS Machine learning methods offer an alternative to traditional approaches for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.
Collapse
|
40
|
Predicting hydrogen storage in MOFs via machine learning. PATTERNS (NEW YORK, N.Y.) 2021; 2:100291. [PMID: 34286305 PMCID: PMC8276024 DOI: 10.1016/j.patter.2021.100291] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Revised: 05/10/2021] [Accepted: 05/26/2021] [Indexed: 11/14/2022]
Abstract
The H2 capacities of a diverse set of 918,734 metal-organic frameworks (MOFs) sourced from 19 databases is predicted via machine learning (ML). Using only 7 structural features as input, ML identifies 8,282 MOFs with the potential to exceed the capacities of state-of-the-art materials. The identified MOFs are predominantly hypothetical compounds having low densities (<0.31 g cm-3) in combination with high surface areas (>5,300 m2 g-1), void fractions (∼0.90), and pore volumes (>3.3 cm3 g-1). The relative importance of the input features are characterized, and dependencies on the ML algorithm and training set size are quantified. The most important features for predicting H2 uptake are pore volume (for gravimetric capacity) and void fraction (for volumetric capacity). The ML models are available on the web, allowing for rapid and accurate predictions of the hydrogen capacities of MOFs from limited structural data; the simplest models require only a single crystallographic feature.
Collapse
|
41
|
|
42
|
Wrist Angle Estimation With a Musculoskeletal Model Driven by Electrical Impedance Tomography Signals. IEEE Robot Autom Lett 2021. [DOI: 10.1109/lra.2021.3060400] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
43
|
A regression approach to zebra crossing detection based on convolutional neural networks. IET CYBER-SYSTEMS AND ROBOTICS 2021. [DOI: 10.1049/csy2.12006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
|
44
|
Prediction of overall survival and response to immune checkpoint inhibitors: An immune-related signature for gastric cancer. Transl Oncol 2021; 14:101082. [PMID: 33784584 PMCID: PMC8027281 DOI: 10.1016/j.tranon.2021.101082] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Revised: 03/16/2021] [Accepted: 03/18/2021] [Indexed: 12/12/2022] Open
Abstract
As is known to us, this is the first immune-gene-related signature of gastric cancer which was validated in various ways. This signature was linked to the mutation status, and we found the difference of mutation between two risk groups. The signature has similar function with TMB.
Gastric cancer (GC) is common in East Asia and South and Central America. Most GC patients miss the opportunities for surgery. Despite their therapeutic potential, immune checkpoint inhibitors (ICIs) only work in part of patients with GC. Thus, this study was aimed at constructing a signature for diagnosis, prognosis, and prediction of response to ICIs. A multivariate analysis showed that the 8-immune-related-gene (IRG) signature was an independent prognostic factor of overall survival among GC patients. In the high-risk group of 8IRG signature risk score, the fractions of CD4 T cells, macrophage M2 and monocyte, which is associated with the progression of cancers, were higher. The low-risk group had a higher immunophenoscore, which meant a better response to ICIs.
Collapse
|
45
|
EPC Labels and Building Features: Spatial Implications over Housing Prices. SUSTAINABILITY 2021. [DOI: 10.3390/su13052838] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The influence of building or dwelling energy performance on the real estate market dynamics and pricing processes is deeply explored, due to the fact that energy efficiency improvement is one of the fundamental reasons for retrofitting the existing housing stock. Nevertheless, the joint effect produced by the building energy performance and the architectural, typological, and physical-technical attributes seems poorly studied. Thus, the aim of this work is to investigate the influence of both energy performance and diverse features on property prices, by performing spatial analyses on a sample of housing properties listed on Turin’s real estate market and on different sub-samples. In particular, Exploratory Spatial Data Analyses (ESDA) statistics, standard hedonic price models (Ordinary Least Squares—OLS) and Spatial Error Models (SEM) are firstly applied on the whole data sample, and then on three different sub-samples: two territorial clusters and a sub-sample representative of the most energy inefficient buildings constructed between 1946 and 1990. Results demonstrate that Energy Performance Certificate (EPC) labels are gaining power in influencing price variations, contrary to the empirical evidence that emerged in some previous studies. Furthermore, the presence of the spatial effects reveals that the impact of energy attributes changes in different sub-markets and thus has to be spatially analysed.
Collapse
|
46
|
Identification of key genes in coronary artery disease: an integrative approach based on weighted gene co-expression network analysis and their correlation with immune infiltration. Aging (Albany NY) 2021; 13:8306-8319. [PMID: 33686958 PMCID: PMC8034924 DOI: 10.18632/aging.202638] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Accepted: 01/29/2021] [Indexed: 12/04/2022]
Abstract
This study aimed to identify key genes related to coronary artery disease (CAD) and its association with immune cells infiltration. GSE20680 and GSE20681 were downloaded from GEO. We identified red and pink modules in WGCNA analysis and found 104 genes in these two modules. Next, least absolute shrinkage and selection operator (LASSO) logistic regression was used to screen and verify the diagnostic markers of CAD. We identified ASCC2, LRRC18, and SLC25A37 as the key genes in CAD diagnosis. We further studied the immune cells infiltration in CAD patients with CIBERSORT, and the correlation between key genes and infiltrating immune cells was analyzed. We also found immune cells, including macrophages M0, mast cells resting and T cells CD8, were associated with ASCC2, LRRC18 and SLC25A37. Gene enrichment analysis indicated that these genes mainly enriched in apoptotic signaling pathway for biological pathway analysis, riboflavin metabolism for KEGG analysis. The diagnostic efficiency of these key genes measured by AUC in the training set, testing set and validation cohort was 0.92, 0.96 and 0.83, respectively. In conclusion, ASCC2, LRRC18 and SLC25A37 can be used as diagnostic markers of CAD, and immune cell infiltration plays an important role in the onset and development of CAD.
Collapse
|
47
|
Intra-abdominal infection in acute pancreatitis in eastern China: microbiological features and a prediction model. ANNALS OF TRANSLATIONAL MEDICINE 2021; 9:477. [PMID: 33850874 PMCID: PMC8039642 DOI: 10.21037/atm-21-399] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Background This study aimed to investigate the microbiol distribution of intra-abdominal infection in patients with acute pancreatitis, and to develop a reliable prediction model to guide the use of antibiotics. Methods Inpatient with acute pancreatitis between January 2015 and June 2020 were enrolled in the study. Participants were divided into the intra-abdominal infection group and non-infection group. Isolated pathogens and antibiotic susceptibility were documented. Characteristics parameters, laboratory results, and outcomes were also compared. Least absolute shrinkage and selection operator (LASSO) regression model was used to select the risk factors associated with intra-abdominal infection in patients with acute pancreatitis. Logistic regression analysis, random forest model, and artificial neural network were also used to validate the performance of the selected predictors in intra-abdominal infection prediction. A novel nomogram based on selected predictors was established to provide individualized risk of developing intra-abdominal infection in patients with acute pancreatitis. Results A total amount of 711 participants were enrolled in the study, and of these, 182 (25.6%) had intra-abdominal infection. Of the 247 isolated pathogens, 45 (18.2%) were multidrug-resistant bacteria, and antibiotic susceptibility was lower than that of China Antimicrobial Surveillance Network 2020. The LASSO method identified 5 independent predictors [intra-abdominal pressure (IAP), acute physiology and chronic health evaluation II (APACHE II), computed tomography severity index (CTSI), the severity of pancreatitis, and intensive care unit (ICU) admission] of intra-abdominal infection, which were validated by three different models. The area under the curve was >0.95 for all 5 predictors. A clinically useful nomogram based on these predictors was successfully established. Conclusions Multidrug-resistant bacteria were quite common in intra-abdominal infection. IAP, APACHE II, CTSI, the severity of pancreatitis, and ICU admission were identified as risk factors and the new nomogram based on these could help clinicians estimate the risk of intra-abdominal infection and optimize antimicrobial prescription for acute pancreatitis patients.
Collapse
|
48
|
Potential of spectroscopic analyses for non-destructive estimation of tea quality-related metabolites in fresh new leaves. Sci Rep 2021; 11:4169. [PMID: 33603126 PMCID: PMC7892543 DOI: 10.1038/s41598-021-83847-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2020] [Accepted: 02/09/2021] [Indexed: 01/31/2023] Open
Abstract
Spectroscopic sensing provides physical and chemical information in a non-destructive and rapid manner. To develop non-destructive estimation methods of tea quality-related metabolites in fresh leaves, we estimated the contents of free amino acids, catechins, and caffeine in fresh tea leaves using visible to short-wave infrared hyperspectral reflectance data and machine learning algorithms. We acquired these data from approximately 200 new leaves with various status and then constructed the regression model in the combination of six spectral patterns with pre-processing and five algorithms. In most phenotypes, the combination of de-trending pre-processing and Cubist algorithms was robustly selected as the best combination in each round over 100 repetitions that were evaluated based on the ratio of performance to deviation (RPD) values. The mean RPD values were ranged from 1.1 to 2.7 and most of them were above the acceptable or accurate threshold (RPD = 1.4 or 2.0, respectively). Data-based sensitivity analysis identified the important hyperspectral regions around 1500 and 2000 nm. Present spectroscopic approaches indicate that most tea quality-related metabolites can be estimated non-destructively, and pre-processing techniques help to improve its accuracy.
Collapse
|
49
|
Fish early life stage toxicity prediction from acute daphnid toxicity and quantum chemistry. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2021; 32:151-174. [PMID: 33525942 DOI: 10.1080/1062936x.2021.1874514] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 01/07/2021] [Indexed: 06/12/2023]
Abstract
One step towards reduced animal testing is the use of in silico screening methods to predict toxicity of chemicals, which requires high-quality data to develop models that are reliable and clearly interpretable. We compiled a large data set of fish early life stage no observed effect concentration endpoints (FELS NOEC) based on published data sources and internal studies, containing data for 338 molecules. Furthermore, we developed a new quantitative structure-activity-activity relationship (QSAAR) model to inform estimation of this endpoint using a combination of dimensionality reduction, regularization, and domain knowledge. In particular, we made use of a sparse partial least squares algorithm (sPLS) to select relevant variables from a huge number of molecular descriptors ranging from topological to quantum chemical properties. The final QSAAR model is of low complexity, consisting of 2 latent variables based on 8 molecular descriptors and experimental Daphnia magna acute data (EC50, 48 h). We provide a mechanistic interpretation of each model parameter. The model performs well, with a coefficient of determination r 2 of 0.723 on the training set (cross-validated q 2 = 0.686) and comparable predictivity on a test data set of chemically related molecules with experimental Daphnia magna data (r 2 test = 0.687, RMSE = 0.793 log units).
Collapse
|
50
|
Development of Machine Learning Models to Predict Compressed Sward Height in Walloon Pastures Based on Sentinel-1, Sentinel-2 and Meteorological Data Using Multiple Data Transformations. REMOTE SENSING 2021. [DOI: 10.3390/rs13030408] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
Accurate information about the available standing biomass on pastures is critical for the adequate management of grazing and its promotion to farmers. In this paper, machine learning models are developed to predict available biomass expressed as compressed sward height (CSH) from readily accessible meteorological, optical (Sentinel-2) and radar satellite data (Sentinel-1). This study assumed that combining heterogeneous data sources, data transformations and machine learning methods would improve the robustness and the accuracy of the developed models. A total of 72,795 records of CSH with a spatial positioning, collected in 2018 and 2019, were used and aggregated according to a pixel-like pattern. The resulting dataset was split into a training one with 11,625 pixellated records and an independent validation one with 4952 pixellated records. The models were trained with a 19-fold cross-validation. A wide range of performances was observed (with mean root mean square error (RMSE) of cross-validation ranging from 22.84 mm of CSH to infinite-like values), and the four best-performing models were a cubist, a glmnet, a neural network and a random forest. These models had an RMSE of independent validation lower than 20 mm of CSH at the pixel-level. To simulate the behavior of the model in a decision support system, performances at the paddock level were also studied. These were computed according to two scenarios: either the predictions were made at a sub-parcel level and then aggregated, or the data were aggregated at the parcel level and the predictions were made for these aggregated data. The results obtained in this study were more accurate than those found in the literature concerning pasture budgeting and grassland biomass evaluation. The training of the 124 models resulting from the described framework was part of the realization of a decision support system to help farmers in their daily decision making.
Collapse
|