Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Total Articles

31
(from Reference Citation Analysis)

Article PDFs (4)

Cited by > 0 (22)

Searched Name

Regression trees

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Statistics

Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Category

Show more Refine

Number	Citation Analysis
1	Development and application of a weighted change score to evaluate interventions for vasomotor symptoms in patients with breast cancer using regression trees: a cohort study. Breast Cancer Res Treat 2024:10.1007/s10549-024-07360-4. [PMID: 38763972 DOI: 10.1007/s10549-024-07360-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 04/24/2024] [Indexed: 05/21/2024] Abstract PURPOSE Vasomotor symptoms (VMS) are common among individuals with breast cancer (BC) and poorly managed symptoms are associated with reduced quality of life, treatment discontinuation, and poorer breast cancer outcomes. Direct comparisons among therapies are limited, as prior studies evaluating VMS interventions have utilized heterogeneous change measures which may not fully assess the perceived impact of change in VMS severity. METHODS We performed a prospective study where BC patients chose one of four categories of interventions to manage VMS. Change in VMS severity at 6 weeks was assessed using the validated Hot Flush Rating Scale (HFRS). A novel weighted change score integrating baseline symptom severity and directionality of change was computed to maximize the correlation between the change score and a perceived treatment effectiveness score. Variables influencing change in VMS severity were included in a regression tree to model factors influencing the weighted change score. RESULTS 100 baseline and follow-up questionnaires assessing VMS were completed by 88 patients. Correlations between treatment effectiveness and VMS outcomes strengthened following adjustment for baseline symptoms. Patients with low VMS severity at baseline did not perceive change in treatment effectiveness. Intervention category was predictive of change in HFRS at 6 weeks. CONCLUSION Baseline symptom severity and the directionality of change (improvement or deterioration of symptoms) influenced the perception of clinically meaningful change in VMS severity. Future interventional studies utilizing the weighted change score should target moderate-high baseline severity patients. Collapse Key Words Breast cancer Hot flashes Machine learning Regression trees Vasomotor symptoms Collapse MESH Headings Collapse Grants RGPIN-2022-04811 Discovery Grant -Natural Sciences and Engineering Research Council of Canada Collapse
2	A novel data-driven model for prediction and adaptive control of pH in raceway reactor for microalgae cultivation. N Biotechnol 2024;82:1-13. [PMID: 38615946 DOI: 10.1016/j.nbt.2024.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 04/05/2024] [Accepted: 04/11/2024] [Indexed: 04/16/2024] Abstract This work proposes a new data-driven model to estimate and predict pH dynamics in freshwater raceway photobioreactors. The resulting model is based purely on data measured from the reactor and divides the pH dynamics into two different behaviors. One behavior is described by the variation of pH due to the photosynthesis phenomena made by microalgae; and the other comes from the effect of CO2 injections into the medium for control purposes. Moreover, it was observed that the model parameters vary throughout the day depending on the weather conditions and reactor status. Thus, a decision tree algorithm is also developed to capture the parameter variation based on measured variables of the system, such as solar radiation, medium temperature, and medium level. The proposed model has been validated for a data set of more than 100 days during 10 months in a semi-industrial raceway reactor, covering a wide range of weather and system scenarios. Additionally, the proposed model was used to design an adaptive control algorithm which was also experimentally tested and compared with a classical fixed parameter control approach. Collapse Key Words Adaptive model Forced response Free response Microalgae Modeling Open reactor Regression trees Collapse MESH Headings Collapse Grants Collapse
3	When mind and measurement diverge; the interplay between subjective cognitive complaints (SCCs), objective cognition, age, and depression in autistic adults. Psychiatry Res 2024;333:115759. [PMID: 38301288 DOI: 10.1016/j.psychres.2024.115759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 01/24/2024] [Accepted: 01/25/2024] [Indexed: 02/03/2024] Abstract While the increased incidence of dementia and subjective cognitive complaints (SCCs) suggests that autistic adults may face cognitive challenges at older age, the extent to which SCCs predict (future) cognitive functioning remains uncertain. This uncertainty is complicated by associations with variables like depression. The current study aims to unravel the interplay of age, depression, cognitive performance, and SCCs in autism. Using a large cross-sectional cohort of autistic (n=202) and non-autistic adults (n=247), we analyzed associations of SCCs with age, depression, and cognitive performance across three domains (visual memory, verbal memory, and fluency). Results showed a strong significant association between depression and SCCs in both autistic and non-autistic adults. Cognitive performance was not significantly associated with SCCs, except for a (modest) association between visual memory performance and SCCs in autistic adults only. Follow-up regression tree analysis indicated that depression and being autistic were considerably more predictive of SCCs than objective cognitive performance. Age nor sex was significantly associated with SCCs. These findings indicate that self-reported cognitive functioning does not equal cognitive performance, and should be interpreted with care, especially in individuals with high rates of depression. Longitudinal investigations are needed to understand SCCs' role in dementia and cognitive health in autism. Collapse Key Words Aging Autism Autistic Adults Cognition Cognitive difficulties Depression Regression trees Subjective cognitive complaints Collapse MESH Headings Adult Humans Autistic Disorder/complications Depression/complications Depression/epidemiology Cross-Sectional Studies Cognition Dementia Neuropsychological Tests Collapse Grants Collapse
4	A COMPRENHESIVE ECO-EFFICIENCY ANALYSIS OF WASTEWATER TREATMENT PLANTS: ESTIMATION OF OPTIMAL OPERATIONAL COSTS AND GREENHOUSE GAS EMISSIONS. WATER RESEARCH 2023;243:120354. [PMID: 37517147 DOI: 10.1016/j.watres.2023.120354] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 07/11/2023] [Accepted: 07/13/2023] [Indexed: 08/01/2023] Abstract The transition to a neutral carbon and sustainable urban water cycle requires improving eco-efficiency in wastewater treatment processes. To support decision-making based on eco-efficiency evaluations, reliable estimations are fundamental. In this study, the eco-efficiency of a sample of 109 WWTPs was evaluated using efficiency analysis tree method. It combines machine learning and linear programming techniques and therefore, overcomes overfitting limitations of non-parametric methods used by past research on this topic. Results from the case study revealed that optimal costs and greenhouse gas emissions depend on the quantity of organic matter and suspended solids removed from wastewater. The estimated average eco-efficiency is 0.373 which involves that the assessed WWTPs could save 0.32 €/m3 and 0.11 kg of CO2 equivalent/m3. Moreover, only 4 out of 109 WWTPs are identified as eco-efficient which implies that the majority of the evaluated facilities can achieve substantial savings in operational costs and greenhouse gas emissions. Collapse Key Words Eco-efficiency Economics Greenhouse gas emissions savings Linear programming Regression trees Wastewater treatment plants Collapse MESH Headings Greenhouse Gases/analysis Waste Disposal, Fluid/methods Wastewater Water Purification Greenhouse Effect Collapse Grants Collapse
5	A comprehensive assessment of energy efficiency of wastewater treatment plants: An efficiency analysis tree approach. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023;885:163539. [PMID: 37146822 DOI: 10.1016/j.scitotenv.2023.163539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 04/12/2023] [Accepted: 04/12/2023] [Indexed: 05/07/2023] Abstract Wastewater treatment plants (WWTPs) are energy intensive facilities. Controlling energy use in WWTPs could bring substantial benefits to people and environment. Understanding how energy efficient the wastewater treatment process is and what drives efficiency would allow treating wastewater in a more sustainable way. In this study, we employed the efficiency analysis trees approach, that combines machine learning and linear programming techniques, to estimate energy efficiency of wastewater treatment process. The findings indicated that considerable energy inefficiency among WWTPs in Chile existed. The mean energy efficiency was 0.287 suggesting that energy use should cut reduce by 71.3 % to treat the same volume of wastewater. This was equivalent to a reduction in energy use by 0.40 kWh/m³ on average. Moreover, only 4 out of 203 assessed WWTPs (1.97 %) were identified as energy efficient. It was also found that the age of treatment plant and type of secondary technology played an important role in explaining energy efficiency variations among WWTPs. Collapse Key Words Bootstrap regression Energy efficiency Energy savings Linear programming Regression trees Wastewater treatment plants Collapse MESH Headings Collapse Grants Collapse
6	Benzo[a]pyrene in Moscow road dust: pollution levels and health risks. ENVIRONMENTAL GEOCHEMISTRY AND HEALTH 2023;45:1669-1694. [PMID: 35583719 DOI: 10.1007/s10653-022-01287-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 04/19/2022] [Indexed: 06/15/2023] Abstract Benzo[a]pyrene (BaP) is one of the priority pollutants in the urban environment. For the first time, the accumulation of BaP in road dust on different types of Moscow roads has been determined. The average BaP content in road dust is 0.26 mg/kg, which is 53 times higher than the BaP content in the background topsoils (Umbric Albeluvisols) of the Moscow Meshchera lowland, 50 km east of the city. The most polluted territories are large roads (0.29 mg/kg, excess of the maximum permissible concentration (MPC) in soils by 14 times) and parking lots in the courtyards (0.37 mg/kg, MPC excess by 19 times). In the city center, the BaP content in the dust of courtyards reaches 1.02 mg/kg (MPC excess by 51 times). The accumulation of BaP depends on the parameters of street canyons formed by buildings along the roads: in short canyons (< 500 m), the content of BaP reaches maximum. Relatively wide canyons accumulate BaP 1.6 times more actively than narrow canyons. The BaP accumulation in road dust significantly increases on the Third Ring Road (TRR), highways, medium and small roads with an average height of the canyon > 20 m. Public health risks from exposure to BaP-contaminated road dust particles were assessed using the US EPA methodology. The main BaP exposure pathway is oral via ingestion (> 90% of the total BaP intake). The carcinogenic risk for adults is the highest in courtyard areas in the south, southwest, northwest, and center of Moscow. The minimum carcinogenic risk is characteristic of the highways and TRR with predominance of nonstop traffic. Collapse Key Words Benzo[a]pyrene Health risks Polycyclic aromatic hydrocarbons Regression trees Road dust Street canyons Collapse MESH Headings Dust/analysis Benzo(a)pyrene Polycyclic Aromatic Hydrocarbons/analysis Air Pollutants/analysis Moscow Environmental Monitoring/methods Carcinogens/analysis Risk Assessment Collapse Grants Collapse
7	Estimation and analysis of missing temperature data in high altitude and snow-dominated regions using various machine learning methods. ENVIRONMENTAL MONITORING AND ASSESSMENT 2023;195:517. [PMID: 36976414 DOI: 10.1007/s10661-023-11143-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Accepted: 03/16/2023] [Indexed: 06/18/2023] Abstract Considering the importance of limited natural resources, accurately recording and evaluating temperature data is critical. The daily average temperature values obtained for the years 2019-2021 of eight highly correlated meteorological stations, characterized by mountainous and cold climate features in the northeast of Turkey, were analyzed by an artificial neural network (ANN), support vector regression (SVR), and regression tree (RT) methods. Output values produced by different machine learning methods compared with different statistical evaluation criteria and the Taylor diagram. ANN6, ANN12, medium gaussian SVR, and linear SVR were chosen as the most suitable methods, especially due to their success in estimating data at high (> 15 ℃) and low (< 0 ℃) temperatures. All the methodologies and network architectures used produced successful results (NSE-R² > 0.90). Some deviations have been observed in the estimation results due to the decrease in the amount of heat emitted from the ground due to fresh snow, especially in the -1 ~ 5 ℃ range, where snowfall begins, in the mountainous areas characterized by heavy snowfall. In models with low neuron numbers (ANN1,2,3) in ANN architecture, the increase in the number of layers does not affect the results. However, the increase in the number of layers in models with high neuron counts positively affects the accuracy of the estimation. Collapse Key Words Artificial neural networks Cold regions Machine learning Regression trees Support vector regression Temperature Collapse MESH Headings Temperature Snow Altitude Environmental Monitoring/methods Machine Learning Collapse Grants Collapse
8	Global patterns and drivers of influenza decline during the COVID-19 pandemic. Int J Infect Dis 2023;128:132-139. [PMID: 36608787 PMCID: PMC9809002 DOI: 10.1016/j.ijid.2022.12.042] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Revised: 12/02/2022] [Accepted: 12/27/2022] [Indexed: 01/09/2023] Open Abstract OBJECTIVES The influenza circulation reportedly declined during the COVID-19 pandemic in many countries. The occurrence of this change has not been studied worldwide nor its potential drivers. METHODS The change in the proportion of positive influenza samples reported by country and trimester was computed relative to the 2014-2019 period using the FluNet database. Random forests were used to determine predictors of change from demographical, weather, pandemic preparedness, COVID-19 incidence, and pandemic response characteristics. Regression trees were used to classify observations according to these predictors. RESULTS During the COVID-19 pandemic, the influenza decline relative to prepandemic levels was global but heterogeneous across space and time. It was more than 50% for 311 of 376 trimesters-countries and even more than 99% for 135. COVID-19 incidence and pandemic preparedness were the two most important predictors of the decline. Europe and North America initially showed limited decline despite high COVID-19 restrictions; however, there was a strong decline afterward in most temperate countries, where pandemic preparedness, COVID-19 incidence, and social restrictions were high; the decline was limited in countries where these factors were low. The "zero-COVID" countries experienced the greatest decline. CONCLUSION Our findings set the stage for interpreting the resurgence of influenza worldwide. Collapse Key Words COVID-19 pandemic Global analysis Influenza Regression trees Collapse MESH Headings Collapse Grants Collapse
9	BoostMEC: predicting CRISPR-Cas9 cleavage efficiency through boosting models. BMC Bioinformatics 2022;23:446. [PMID: 36289480 PMCID: PMC9597963 DOI: 10.1186/s12859-022-04998-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2022] [Accepted: 10/21/2022] [Indexed: 11/10/2022] Open Abstract BACKGROUND In the CRISPR-Cas9 system, the efficiency of genetic modifications has been found to vary depending on the single guide RNA (sgRNA) used. A variety of sgRNA properties have been found to be predictive of CRISPR cleavage efficiency, including the position-specific sequence composition of sgRNAs, global sgRNA sequence properties, and thermodynamic features. While prevalent existing deep learning-based approaches provide competitive prediction accuracy, a more interpretable model is desirable to help understand how different features may contribute to CRISPR-Cas9 cleavage efficiency. RESULTS We propose a gradient boosting approach, utilizing LightGBM to develop an integrated tool, BoostMEC (Boosting Model for Efficient CRISPR), for the prediction of wild-type CRISPR-Cas9 editing efficiency. We benchmark BoostMEC against 10 popular models on 13 external datasets and show its competitive performance. CONCLUSIONS BoostMEC can provide state-of-the-art predictions of CRISPR-Cas9 cleavage efficiency for sgRNA design and selection. Relying on direct and derived sequence features of sgRNA sequences and based on conventional machine learning, BoostMEC maintains an advantage over other state-of-the-art CRISPR efficiency prediction models that are based on deep learning through its ability to produce more interpretable feature insights and predictions. Collapse Key Words CRISPR-Cas9 Feature engineering Interpretability LightGBM Machine learning Regression trees sgRNA Collapse MESH Headings Collapse Grants Collapse
10	Discrimination between hereditary spastic paraplegia and cerebral palsy based on gait analysis data: A machine learning approach. Gait Posture 2022;98:34-38. [PMID: 36041285 DOI: 10.1016/j.gaitpost.2022.08.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Revised: 07/12/2022] [Accepted: 08/15/2022] [Indexed: 02/02/2023] Abstract BACKGROUND There is no current consensus on how to differentiate between hereditary spastic paraplegia and spastic cerebral palsy on the basis of clinical presentation. Several previous studies have investigated differences in kinematic parameters obtained from clinical gait analysis. None have attempted to combine multiple gait and physical exam measures to discriminate between these two diagnoses. This study aims to investigate the ability of a machine learning approach using data from clinical gait analysis to differentiate these cohorts. METHODS A retrospective analysis of a gait database compiled a dataset of 179 gait and physical exam variables from 28 individuals (62 analyses) diagnosed with hereditary spastic paraplegia and 678 (1504 analyses) with bilateral spastic cerebral palsy. This data was used in a Bayesian additive regression tree (BART) analysis classified by medical record diagnosis. A 10-fold cross validation generated probabilistic distribution that each analysis was from an individual carrying the hereditary spastic paraplegia diagnosis. A diagnostic probability cutoff threshold balanced type I and type II errors. Predicted versus actual diagnoses were classified into a contingency table. RESULTS The algorithm was able to correctly classify the two diagnoses with 91% specificity and 90% sensitivity. CONCLUSIONS A machine learning approach using data from clinical gait analysis was able to distinguish participants with hereditary spastic paraplegia from those with bilateral spastic cerebral palsy with high specificity and sensitivity. This algorithm can be used to assess if individuals seen for gait disorders who do not yet have a definitive diagnosis have characteristics associated with hereditary spastic paraplegia. The results of the model inform the decision to suggest genetic testing to either confirm or refute the diagnosis of hereditary spastic paraplegia. Collapse Key Words Cerebral palsy Gait Hereditary spastic paraplegia Machine learning Regression trees Collapse MESH Headings Collapse Grants Collapse
11	Tree-Values: Selective Inference for Regression Trees. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2022;23:305. [PMID: 38481523 PMCID: PMC10933572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/17/2024] Abstract We consider conducting inference on the output of the Classification and Regression Tree (CART) (Breiman et al., 1984) algorithm. A naive approach to inference that does not account for the fact that the tree was estimated from the data will not achieve standard guarantees, such as Type 1 error rate control and nominal coverage. Thus, we propose a selective inference framework for conducting inference on a fitted CART tree. In a nutshell, we condition on the fact that the tree was estimated from the data. We propose a test for the difference in the mean response between a pair of terminal nodes that controls the selective Type 1 error rate, and a confidence interval for the mean response within a single terminal node that attains the nominal selective coverage. Efficient algorithms for computing the necessary conditioning sets are provided. We apply these methods in simulation and to a dataset involving the association between portion control interventions and caloric intake. Collapse Key Words CART Regression trees hypothesis testing post-selection inference selective inference Collapse MESH Headings Collapse Grants R01 GM123993 NIGMS NIH HHS Collapse
12	Machine learning for surgical time prediction. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021;208:106220. [PMID: 34161848 DOI: 10.1016/j.cmpb.2021.106220] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Accepted: 05/26/2021] [Indexed: 06/13/2023] Abstract BACKGROUND AND OBJECTIVE Operating Rooms (ORs) are among the most expensive services in hospitals. A challenge to optimize the OR efficiency is to improve the surgery scheduling task, which requires the estimation of surgical time duration. Surgeons or programming units (based on people's experience) typically do the duration estimation using an experience-based strategy, which may include some bias, such as overestimating the surgery time, increasing ORs' operational cost. METHODS This paper analyzes a machine learning-based solution for surgical time predictions. We apply and compare four machine-learning algorithms (Linear Regression, Support Vector Machines, Regression Trees, and Bagged Trees) to predict the surgical time duration at a tertiary referral university hospital in Bogotá, Colombia. Historical data from 2004 until 2019 was used to train the algorithms. Comparison among algorithms was given in terms of the Root Mean Square Error (RMSE) of the predicted surgery duration and the algorithms' computing time. The algorithm with the best performance was compared to the currently used experience-based method. RESULTS All the ML algorithms predict the surgery duration with an error between 26 and 37 min. The best overall performance was obtained using Bagged Trees (26 min RMSE, 3.16 min training time, 0.49 min testing time) when using a subset of the DB with the nine specialties containing 80% of the surgeries. Bagged Trees also outperformed the experience-based method with a lower RMSE; however, it also shifted from a predominant overestimation to underestimating surgeries' duration. CONCLUSIONS Different ML algorithms for predicting the surgical time duration, showing and comparing their performance. Bagged Trees showed the best performance in terms of RMSE and computing time. Depending on the initial data, Bagged Trees outperformed the experience-based method, but future work is necessary to suit it, like any other ML algorithm, to the hospitals' needs. Collapse Key Words Assembly methods Linear regression Machine learning Regression trees Support vector machine Surgical time prediction Collapse MESH Headings Algorithms Humans Linear Models Machine Learning Operative Time Support Vector Machine Collapse Grants Collapse
13	A longitudinal view of successful aging with HIV: role of resilience and environmental factors. Qual Life Res 2021;31:1135-1145. [PMID: 34460077 DOI: 10.1007/s11136-021-02970-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/06/2021] [Indexed: 10/20/2022] Abstract PURPOSE The purpose of this study is to estimate the extent to which people aging with HIV meet criteria for successful aging as operationalized through HRQL and maintain this status over time. A second objective is to identify factors that place people at promise for continued successful aging, including environmental and resilience factors. METHODS Participants were members of the Positive Brain Health Now (BHN) cohort. People ≥ 50 years (n = 513) were classified as aging successfully if they were at or above norms on 7 or 8 of 8 health-related quality of life domains from the RAND-36. Group-based trajectory analysis, regression tree analysis, a form of machine learning, and logistic regression were applied to identify factors predicting successful aging. RESULTS 73 (14·2%) met criteria for successful aging at entry and did not change status over time. The most influential factor was loneliness which split the sample into two groups with the prevalence of successful aging 28·4% in the "almost never" lonely compared to 4·6% in the "sometimes/often" lonely group. Other influential factors were feeling safe, social network, motivation, stigma, and socioeconomic status. These factors identified 17 sub-groups with at least 30 members with the proportions classified as aging successfully ranging from 0 to 79·4%. The nine variables important to classifying successful aging had a predictive accuracy of 0.862. Self-reported cognition but not cognitive test performance improved this accuracy to 0.895. The two groups defined by successful aging status did not differ on age, sex or viral load, nadir and current. CONCLUSION The results indicate the important role of social determinants of health in successful aging among people living with HIV. Collapse Key Words Classification Determinants of health HIV Regression trees Resilience Successful aging Collapse MESH Headings Collapse Grants Collapse
14	Configuration of daily grazing and searching of growing beef cattle in grassland: observational study. Animal 2021;15:100336. [PMID: 34371468 DOI: 10.1016/j.animal.2021.100336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Revised: 06/29/2021] [Accepted: 07/02/2021] [Indexed: 12/01/2022] Open Abstract Many of the studies in Campos grasslands focus on management aspects such as the control of herbage allowance, and application of nutrients and/or overseeding with legumes. However, there is little literature on how the Campos grassland resource is utilised, especially regarding the grazing pattern and the relationship between pasture quantity and quality on daily grazing activities. The study of the ingestive behaviour in species-rich and heterogeneous native grasslands during daylight hours, and understanding how animals prioritise quality or quantity of intake in relation to pasture attributes, are important to comprehend the ingestive-digestive processes modulating the energy intake of animals and to achieve a better grazing management. Therefore, the objective was to describe and quantify the daily grazing behaviour of growing cattle grazing native pasture with different structures as a result of different management practices, and study the relationship of pasture attributes and intake through multivariate analysis. The study was carried out at the Faculty of Agronomy, Paysandú, Uruguay. Treatments were native grassland, overseeding with Trifolium pratense and Lotus tenuis + phosphorus, and native pasture + nitrogen-phosphorus. Grazing activities were discriminated into grazing, searching (defined when animals take 1-2 bites in one feeding station and then change to another feeding station and so on), ruminating and idling. The probability of time allocated to each activity was continuously measured during daylight hours (0700-1930) and was related to pasture structure and forage quality using regression tree models, while the bite rate was determined every 2 h. The diurnal pattern of growing cattle showed grazing and searching sessions, followed by ruminating and idling sessions. The length of sessions (as the probability of time allocated to each activity) varied throughout the day. The grazing probability was greater during afternoon than morning and midday (0.74 vs 0.45 vs 0.46, respectively), and it was associated with higher bite rate (34.2 bites/min). Regression tree models showed different grazing, searching and ruminating strategies according to pasture attributes. During the morning, animals modified grazing, searching, ruminating and idling strategies according to bite rate, crude protein in diet and herbage allowance. At midday, they only adjusted ruminating and idling, while during afternoon sessions, grazing activities were modified by pasture quantity attributes such as herbage mass and herbage allowance. By controlling the herbage allowance, herbage mass and pasture height, animals prioritise quality in the morning and quantity in the afternoon, integrating and modifying the grazing-searching and ruminating-idling pattern. Collapse Key Words Cattle ingestive behaviour Grazing management Grazing pattern Regression trees Searching strategy Collapse MESH Headings Collapse Grants Collapse
15	Clinical signs associated with earlier diagnosis of children with autism Spectrum disorder. BMC Pediatr 2021;21:96. [PMID: 33632186 PMCID: PMC7905573 DOI: 10.1186/s12887-021-02551-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 02/10/2021] [Indexed: 11/10/2022] Open Abstract BACKGROUND The objective of this study is to gain new insights into the relationship between clinical signs and age at diagnosis. METHOD We utilize a new, large, online survey of 1743 parents of children diagnosed with ASD, and use multiple statistical approaches. These include regression analysis, factor analysis, and machine learning (regression tree). RESULTS We find that clinical signs that most strongly predict early diagnosis are not necessarily specific to autism, but rather those that initiate the process that eventually leads to an ASD diagnosis. Given the high correlations between symptoms, only a few signs are found to be important in predicting early diagnosis. For several clinical signs we find that their presence and intensity are positively correlated with delayed diagnosis (e.g., tantrums and aggression). Even though our data are drawn from parents' retrospective accounts, we provide evidence that parental recall bias and/or hindsight bias did not play a significant role in shaping our results. CONCLUSION In the subset of children without early deficits in communication, diagnosis is delayed, and this might be improved if more attention will be given to clinical signs that are not necessarily considered as ASD symptoms. Our findings also suggest that careful attention should be paid to children showing excessive tantrums or aggression, as these behaviors may interfere with an early ASD diagnoses. Collapse Key Words Autism spectrum disorder Clinical signs Diagnosis age Early diagnosis Regression trees Symptoms Collapse MESH Headings Autism Spectrum Disorder/diagnosis Child Communication Early Diagnosis Humans Parents Retrospective Studies Collapse Grants Organization for Autism Research Collapse
16	Treed distributed lag nonlinear models. Biostatistics 2021;23:754-771. [PMID: 33527997 DOI: 10.1093/biostatistics/kxaa051] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 10/07/2020] [Accepted: 11/01/2020] [Indexed: 11/14/2022] Open Abstract In studies of maternal exposure to air pollution, a children's health outcome is regressed on exposures observed during pregnancy. The distributed lag nonlinear model (DLNM) is a statistical method commonly implemented to estimate an exposure-time-response function when it is postulated the exposure effect is nonlinear. Previous implementations of the DLNM estimate an exposure-time-response surface parameterized with a bivariate basis expansion. However, basis functions such as splines assume smoothness across the entire exposure-time-response surface, which may be unrealistic in settings where the exposure is associated with the outcome only in a specific time window. We propose a framework for estimating the DLNM based on Bayesian additive regression trees. Our method operates using a set of regression trees that each assume piecewise constant relationships across the exposure-time space. In a simulation, we show that our model outperforms spline-based models when the exposure-time surface is not smooth, while both methods perform similarly in settings where the true surface is smooth. Importantly, the proposed approach is lower variance and more precisely identifies critical windows during which exposure is associated with a future health outcome. We apply our method to estimate the association between maternal exposures to PM$_{2.5}$ and birth weight in a Colorado, USA birth cohort. Collapse Key Words Air pollution Children’s health Critical windows Distributed lag Regression trees Collapse MESH Headings Collapse Grants Collapse
17	A bio-economic simulation study on the association between key performance indicators and pluck lesions in Irish farrow-to-finish pig farms. Porcine Health Manag 2020;6:40. [PMID: 33298194 PMCID: PMC7724844 DOI: 10.1186/s40813-020-00176-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2020] [Accepted: 11/24/2020] [Indexed: 11/20/2022] Open Abstract Background Pluck lesions are associated with decreased performance in grower-finisher pigs, but their economic impact needs to be further investigated. This study aimed to identify the main pluck lesions and the cut-off value for their prevalence, associated with changes in average daily gain (ADG) during the wean-to-finish period, to simulate their effects on economic performance of farrow-to-finish farms. Pigs (n = 162 ± 51.9 per farm) from 56 farrow-to-finish farms were inspected at slaughter and the prevalence of enzootic pneumonia-like lesions, pleurisy, lung scars, abscesses, pericarditis, and liver milk spots was estimated. For each farm, annual performance indicators were obtained. Regression trees analysis (RTA) was used to identify pluck lesions and to estimate cut-off values for their prevalence associated with changes in ADG. Different scenarios were simulated as per RTA results and economic and risk analyses were performed using the Teagasc Pig Production Model. Risk analysis was performed by Monte Carlo sampling using the Microsoft Excel add-in @Risk with 10,000 iterations. Results Pleurisy and lung scars were the main lesions associated with changes in ADG. Three scenarios were simulated based on RTA results: a 728 sow farrow-to-finish farm with prevalence of i) pleurisy < 25% and lung scars < 8% (LPLSC; ADG = 760 g); ii) pleurisy < 25% and lung scar ≥8% (LPHSC; ADG = 725 g) and iii) pleurisy ≥25% (HP; ADG = 671 g). The economic analysis showed increased feed and dead animals for disposal costs, and lower sales in the HP and LPHSC scenarios than in the LPLSC scenario; thereby reducing gross margin and net profit. Results from the risk analysis showed lower probability of reaching any given level of profit in the HP scenario compared with the LPHSC and LPLSC scenarios. Conclusion Under the conditions of this study, higher prevalence of pleurisy and lung scars were associated with decreased ADG during the grower-finisher period and with lower economic return in the simulated farms. These results highlight the economic benefits and importance of preventing and/or controlling respiratory disease. Collapse Key Words Economic modelling Lung scars Pig production systems Pleurisy Regression trees Stochastic budgeting Collapse MESH Headings Collapse Grants Collapse
18	Intraseasonal variation of phycocyanin concentrations and environmental covariates in two agricultural irrigation ponds in Maryland, USA. ENVIRONMENTAL MONITORING AND ASSESSMENT 2020;192:706. [PMID: 33064217 DOI: 10.1007/s10661-020-08664-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Accepted: 10/05/2020] [Indexed: 06/11/2023] Abstract Recently, cyanobacteria blooms have become a concern for agricultural irrigation water quality. Numerous studies have shown that cyanotoxins from these harmful algal blooms (HABs) can be transported to and assimilated into crops when present in irrigation waters. Phycocyanin is a pigment known only to occur in cyanobacteria and is often used to indicate cyanobacteria presence in waters. The objective of this work was to identify the most influential environmental covariates affecting the phycocyanin concentrations in agricultural irrigation ponds that experience cyanobacteria blooms of the potentially toxigenic species Microcystis and Aphanizomenon using machine learning methodology. The study was performed at two agricultural irrigation ponds over a 5-month period in the summer of 2018. Phycocyanin concentrations, along with sensor-based and fluorometer-based water quality parameters including turbidity (NTU), pH, dissolved oxygen (DO), fluorescent dissolved organic matter (fDOM), conductivity, chlorophyll, color dissolved organic matter (CDOM), and extracted chlorophyll were measured. Regression tree analyses were used to determine the most influential water quality parameters on phycocyanin concentrations. Nearshore sampling locations had higher phycocyanin concentrations than interior sampling locations and "zones" of consistently higher concentrations of phycocyanin were found in both ponds. The regression tree analyses indicated extracted chlorophyll, CDOM, and NTU were the three most influential parameters on phycocyanin concentrations. This study indicates that sensor-based and fluorometer-based water quality parameters could be useful to identify spatial patterns of phycocyanin concentrations and therefore, cyanobacteria blooms, in agricultural irrigation ponds and potentially other water bodies. Collapse Key Words Cyanobacteria Harmful algal blooms Irrigation ponds Monitoring Phycocyanin Regression trees Water quality Collapse MESH Headings Agricultural Irrigation Environmental Monitoring Maryland Phycocyanin Ponds Collapse Grants Collapse
19	Seasonality of mean flows as a potential tool for the assessment of ecological processes: Mountain rivers, Polish Carpathians. THE SCIENCE OF THE TOTAL ENVIRONMENT 2020;716:136988. [PMID: 32059323 DOI: 10.1016/j.scitotenv.2020.136988] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Revised: 01/13/2020] [Accepted: 01/27/2020] [Indexed: 06/10/2023] Abstract The classification of river catchments according to their hydrological regime is crucial elements of regionalisation. In absence of hydrological data, the regionalisation of catchment method may be used to asses many flows characteristics like regime or design flow and thus provide help in the analysis of hydrological and ecological processes and also in the management of water resources. Correct clarification of catchments requires knowledge about the main factors that influence on river regime, like meteorologic conditions, land cover/land use, geology, soil properties terrain features, human activities. The aim of the study was to analyse the relationship between selected catchment attributes along with precipitation climatology and seasonality of mean flows (MQ) in the mountainous rivers in the Upper Vistula basin (the biggest and the most important river in Poland) and regionalisation catchments based on seasonality index. To achieve the objective of the study, we concentrated on the mountain stream and river catchments that are regionalised to the Upper Vistula basin (all of which are Vistula tributaries) and we employed the Colwell's seasonality index in an attempt to clear up the said ecohydrological measures. The study confirmed that in mountainous catchments, where response time to rainfall is shorter due to larger slopes, higher seasonality of mean monthly discharges, as expressed by the seasonality index M, is observed. In this case, variability of seasonal rainfall affected seasonality of MQ. In case of smaller slopes and large forest cover and catchment areas, seasonality of flows was lower. The innovative aspect of the presented study is the attempt to correlate the Colwell's seasonality index with the physiographic and meteorological characteristics of the catchment. Until now, the characteristics of the catchments have been used as factors differentiating the hydrological regime of the catchments, thus allowing for agglomeration of similar catchments. Our results foster better understanding of the natural processes in the river basin, which definitely would help in better management of the environment and its relationship with huge number of people living there and depend on it. These results show that the regression tree methods based on CART algorithm can be used as effective tool for classification of catchments. Collapse Key Words Mountain catchments Regionalisation Regression trees Seasonality of flows Substrate permeability Collapse MESH Headings Collapse Grants Collapse
20	EEG multifractal analysis correlates with cognitive testing scores and clinical staging in mild cognitive impairment. J Clin Neurosci 2020;76:195-200. [PMID: 32307299 DOI: 10.1016/j.jocn.2020.04.003] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 04/02/2020] [Indexed: 11/24/2022] Abstract Alzheimer's disease and mild cognitive impairment are increasingly prevalent global health concerns in aging industrialized societies. There are only limited non-invasive biomarkers for the cognitive and functional impairment associated with dementia. Multifractal analysis of EEG has recently been proposed as having the potential to be an improved method of quantitative EEG analysis compared to existing techniques (e.g., spectral analysis). We utilized an existing database of a study of healthy elderly patients (N = 20) who were assessed with cognitive testing (Folstein Mini Mental Status Exam; MMSE) and resting state EEG (4 leads). Each subject's EEG was separated into two 30 s tracings for training and testing a statistical model against the MMSE scores. We compared multifractal detrended fluctuation analysis (MF-DFA) against Fourier Transform (FT) in the ability to produce an accurate classification and regression trees estimator for the testing EEG segments. The MF-DFA-based statistical model MMSE estimation strongly correlated with the actual MMSE when applied to the test EEG parameter dataset, whereas the corresponding FT-based model did not. Using a standardized cutoff value for MMSE-based clinical staging, the MF-DFA-based statistical model was both sensitive and specific for clinical staging of both mild Alzheimer's disease and mild cognitive impairment. MF-DFA shows promise as a method of quantitative EEG analysis to accurately estimate cognition in Alzheimer's disease. Collapse Key Words Alzheimer’s disease EEG Electroencephalography Mild cognitive impairment Multifractal Regression trees Collapse MESH Headings Collapse Grants Collapse
21	Integrating critical values of soil drivers for mitigating GHGs: An assessment in a sugarcane cropping system. THE SCIENCE OF THE TOTAL ENVIRONMENT 2020;704:135420. [PMID: 31812389 DOI: 10.1016/j.scitotenv.2019.135420] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Revised: 11/05/2019] [Accepted: 11/05/2019] [Indexed: 06/10/2023] Abstract Agricultural practices can reduce emissions of greenhouse gases (GHG). The definition of management practices toward mitigating GHG emissions could gain accuracy by integrating critical values of soil variables related to GHG fluxes. The aim of this study was to combine critical values of soil variables determining groups of GHG fluxes with similar and/or opposite direction of carbon dioxide (CO₂), nitrous oxide (N₂O) and methane (CH₄). We determined CO₂, N₂O, CH₄ fluxes, soil temperature, gravimetric soil moisture (GSM), soil inorganic nitrogen (SIN), soil bulk density (SBD), soil porosity (P), and water-filled pore space (WFPS) monthly in three consecutive growing seasons in a sugarcane agroecosystem. The regression tree method defined groups of emission. Six terminal groups of CO₂, N₂O fluxes, and four for CH₄ fluxes were determined. The critical values of soil variables that defined the terminal groups with the highest fluxes were soil temperature (>19 °C) and GSM (>35.2%) for CO₂, GSM (>29.2%) and SIN (≤1.1 ppm) for N₂O, and GSM (>24.9 °C), SBD (>0.98 g cm^-3) and SIN (>1.82 ppm) for CH₄. Trade-offs were found among GHGs: N₂O emissions were high and CO₂ emissions were low when GSM and soil temperature ranged from 29 to 35% and 14-19 °C, respectively (moderate values). CO₂ emissions were high and N₂O emissions were the lowest when GSM was equal or lower than 29.2% and soil temperature ranged from 19 to 21.3 °C. In this study, we highlight that management practices aimed to mitigate GHG fluxes should consider the integrated analysis of critical values of soil variables for GHGs together in order to avoid trade-offs. Collapse Key Words Greenhouse gases Mitigation Regression trees Trade-offs Collapse MESH Headings Agriculture/methods Air Pollution/prevention & control Greenhouse Gases/analysis Saccharum Soil Collapse Grants Collapse
22	Prevalence and correlates of diabetes among criminal justice-involved individuals in the United States. Ann Epidemiol 2019;36:55-61. [PMID: 31301945 DOI: 10.1016/j.annepidem.2019.05.004] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Revised: 05/02/2019] [Accepted: 05/29/2019] [Indexed: 01/02/2023] Abstract PURPOSE Diabetes is one of the most prevalent and fastest-growing adverse health conditions in the United States and disproportionately affects those demographic and socioeconomic groups that are also more likely to be involved with the criminal justice (CJ) system. This study examines the prevalence and correlates of diabetes among CJ-involved individuals in the United States. METHODS Using traditional statistical modeling and modern machine learning methods, data from the National Study on Drug Use and Health were analyzed to compare the correlates and predictive interactions of diabetes diagnosis among those respondents on probation and parole to a sample, matched by age and gender, who were not. RESULTS Subjects involved in the CJ system were 15% more likely (1.66% vs. 1.44%, P = .015) to report a past-year diagnosis of diabetes than a sample of noninvolved individuals matched by age and sex, although this association was not statistically significant after adjusting for demographic and behavioral confounders. Similar trends in diabetes prevalence emerged for the non-CJ and CJ groups with regard to income, depression (OR of 2.38 and 1.65 for the CJ and non-CJ groups, respectively) and attainment of college education (OR of 0.64 and 0.30 for the CJ and non-CJ groups, respectively, compared with those with less than a high school education). Results also suggested that a generally high propensity toward risk taking had a negative effect on diabetes for the non-CJ group (OR 0.78; 95% CI 0.69-0.87), yet increased the odds of diabetes (OR 1.38; 95% CI 1.02-1.85) for the CJ group. CONCLUSIONS Involvement in the U.S. CJ system is correlated with a higher prevalence of diabetes and differing risk factors for diabetes diagnosis. Further research is necessary, however, to unpack the precise causal pathways that underlie the associational trends in the current analysis. Collapse Key Words Diabetes Machine learning Parolees Probationers Regression trees Collapse MESH Headings Collapse Grants Collapse
23	A regression-tree multilayer-perceptron hybrid strategy for the prediction of ore crushing-plate lifetimes. J Adv Res 2019;18:173-184. [PMID: 31032118 PMCID: PMC6479016 DOI: 10.1016/j.jare.2019.03.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Revised: 03/21/2019] [Accepted: 03/21/2019] [Indexed: 11/02/2022] Open Abstract Highly tensile manganese steel is in great demand owing to its high tensile strength under shock loads. All workpieces are produced through casting, because it is highly difficult to machine. The probabilistic aspects of its casting, its variable composition, and the different casting techniques must all be considered for the optimisation of its mechanical properties. A hybrid strategy is therefore proposed which combines decision trees and artificial neural networks (ANNs) for accurate and reliable prediction models for ore crushing plate lifetimes. The strategic blend of these two high-accuracy prediction models is used to generate simple decision trees which can reveal the main dataset features, thereby facilitating decision-making. Following a complexity analysis of a dataset with 450 different plates, the best model consisted of 9 different multilayer perceptrons, the inputs of which were only the Fe and Mn plate compositions. The model recorded a low root mean square error (RMSE) of only 0.0614 h for the lifetime of the plate: a very accurate result considering their varied lifetimes of between 746 and 6902 h in the dataset. Finally, the use of these models under real industrial conditions is presented in a heat map, namely a 2D representation of the main manufacturing process inputs with a colour scale which shows the predicted output, i.e. the expected lifetime of the manufactured plates. Thus, the hybrid strategy extracts core training dataset information in high-accuracy prediction models. This novel strategy merges the different capabilities of two families of machine-learning algorithms. It provides a high-accuracy industrial tool for the prediction of the full lifetime of highly tensile manganese steel plates. The results yielded a precision prediction of (RMSE of 0.061 h) for the full lifetime of (light, medium, and heavy) crusher plates manufactured with the three (experimental, classic, and highly efficient (new)) casting methods. Collapse Key Words Artificial intelligence Hadfield steel Lifetime prediction Multi-layer perceptrons Regression trees Resource savings Collapse MESH Headings Collapse Grants Collapse
24	Unsupervised Gene Network Inference with Decision Trees and Random Forests. Methods Mol Biol 2019;1883:195-215. [PMID: 30547401 DOI: 10.1007/978-1-4939-8882-2_8] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Abstract In this chapter, we introduce the reader to a popular family of machine learning algorithms, called decision trees. We then review several approaches based on decision trees that have been developed for the inference of gene regulatory networks (GRNs). Decision trees have indeed several nice properties that make them well-suited for tackling this problem: they are able to detect multivariate interacting effects between variables, are non-parametric, have good scalability, and have very few parameters. In particular, we describe in detail the GENIE3 algorithm, a state-of-the-art method for GRN inference. Collapse Key Words Decision trees Machine learning Random forest Regression trees Tree ensembles Collapse MESH Headings Computational Biology/instrumentation Computational Biology/methods Decision Trees Gene Expression Regulation Gene Regulatory Networks Models, Genetic Unsupervised Machine Learning Collapse Grants Collapse
25	Cyanotoxin level prediction in a reservoir using gradient boosted regression trees: a case study. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2018;25:22658-22671. [PMID: 29846899 DOI: 10.1007/s11356-018-2219-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2017] [Accepted: 05/03/2018] [Indexed: 06/08/2023] Abstract Cyanotoxins are a type of cyanobacteria that is poisonous and poses a health threat in waters that could be used for drinking or recreational purposes. Thus, it is necessary to predict their presence to avoid risks. This paper presents a nonparametric machine learning approach using a gradient boosted regression tree model (GBRT) for prediction of cyanotoxin contents from cyanobacterial concentrations determined experimentally in a reservoir located in the north of Spain. GBRT models seek and obtain good predictions in highly nonlinear problems, like the one treated here, where the studied variable presents low concentrations of cyanotoxins mixed with high concentration peaks. Two types of results have been obtained: firstly, the model allows the ranking or the dependent variables according to its importance in the model. Finally, the high performance and the simplicity of the model make the gradient boosted tree method attractive compared to conventional forecasting techniques. Collapse Key Words Cyanobacteria Cyanotoxins Gradient boosting Harmful algal blooms (HABs) Regression trees Statistical machine learning techniques Collapse MESH Headings Bacterial Toxins/analysis Cyanobacteria/chemistry Lakes/analysis Machine Learning Regression Analysis Spain Statistics, Nonparametric Water Supply Collapse Grants Collapse
26	Online monitoring and conditional regression tree test: Useful tools for a better understanding of combined sewer network behavior. THE SCIENCE OF THE TOTAL ENVIRONMENT 2018;625:336-343. [PMID: 29289781 DOI: 10.1016/j.scitotenv.2017.12.239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2017] [Revised: 12/20/2017] [Accepted: 12/20/2017] [Indexed: 06/07/2023] Abstract A good knowledge of the dynamic of pollutant concentration and flux in a combined sewer network is necessary when considering solutions to limit the pollutants discharged by combined sewer overflow (CSO) into receiving water during wet weather. Identification of the parameters that influence pollutant concentration and flux is important. Nevertheless, few studies have obtained satisfactory results for the identification of these parameters using statistical tools. Thus, this work uses a large database of rain events (116 over one year) obtained via continuous measurement of rainfall, discharge flow and chemical oxygen demand (COD) estimated using online turbidity for the identification of these parameters. We carried out a statistical study of the parameters influencing the maximum COD concentration, the discharge flow and the discharge COD flux. In this study a new test was used that has never been used in this field: the conditional regression tree test. We have demonstrated that the antecedent dry weather period, the rain event average intensity and the flow before the event are the three main factors influencing the maximum COD concentration during a rainfall event. Regarding the discharge flow, it is mainly influenced by the overall rainfall height but not by the maximum rainfall intensity. Finally, COD discharge flux is influenced by the discharge volume and the maximum COD concentration. Regression trees seem much more appropriate than common tests like PCA and PLS for this type of study as they take into account the thresholds and cumulative effects of various parameters as a function of the target variable. These results could help to improve sewer and CSO management in order to decrease the discharge of pollutants into receiving waters. Collapse Key Words Combined sewer system Continuous monitoring Pollutant Regression trees Statistical study Turbidity Collapse MESH Headings Collapse Grants Collapse
27	Hospital heterogeneity: what drives the quality of health care. THE EUROPEAN JOURNAL OF HEALTH ECONOMICS : HEPAC : HEALTH ECONOMICS IN PREVENTION AND CARE 2018;19:385-408. [PMID: 28439750 PMCID: PMC5978923 DOI: 10.1007/s10198-017-0891-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 03/28/2017] [Indexed: 05/29/2023] Abstract A major feature of health care systems is substantial variation in health care quality across hospitals. The quality of stroke care widely varies across NHS hospitals. We investigate factors that may explain variations in health care quality using measures of quality of stroke care. We combine NHS trust data from the National Sentinel Stroke Audit with other data sets from the Office for National Statistics, NHS and census data to capture hospitals' human and physical assets and organisational characteristics. We employ a class of non-parametric methods to explore the complex structure of the data and a set of correlated random effects models to identify key determinants of the quality of stroke care. The organisational quality of the process of stroke care appears as a fundamental driver of clinical quality of stroke care. There are rich complementarities amongst drivers of quality of stroke care. The findings strengthen previous research on managerial and organisational determinants of health care quality. Collapse Key Words Health care quality Machine learning Mixed effects model NHS Panel data Prediction Regression trees Stroke Collapse MESH Headings Hospitals/statistics & numerical data Humans Quality of Health Care Stroke/diagnosis Stroke/therapy Collapse Grants Collapse
28	Assessment of wastewater treatment facility compliance with decreasing ammonia discharge limits using a regression tree model. THE SCIENCE OF THE TOTAL ENVIRONMENT 2017;598:249-257. [PMID: 28441603 DOI: 10.1016/j.scitotenv.2017.03.236] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/20/2017] [Revised: 03/08/2017] [Accepted: 03/25/2017] [Indexed: 05/13/2023] Abstract A regression tree-based diagnostic approach is developed to evaluate factors affecting US wastewater treatment plant compliance with ammonia discharge permit limits using Discharge Monthly Report (DMR) data from a sample of 106 municipal treatment plants for the period of 2004-2008. Predictor variables used to fit the regression tree are selected using random forests, and consist of the previous month's effluent ammonia, influent flow rates and plant capacity utilization. The tree models are first used to evaluate compliance with existing ammonia discharge standards at each facility and then applied assuming more stringent discharge limits, under consideration in many states. The model predicts that the ability to meet both current and future limits depends primarily on the previous month's treatment performance. With more stringent discharge limits predicted ammonia concentration relative to the discharge limit, increases. In-sample validation shows that the regression trees can provide a median classification accuracy of >70%. The regression tree model is validated using ammonia discharge data from an operating wastewater treatment plant and is able to accurately predict the observed ammonia discharge category approximately 80% of the time, indicating that the regression tree model can be applied to predict compliance for individual treatment plants providing practical guidance for utilities and regulators with an interest in controlling ammonia discharges. The proposed methodology is also used to demonstrate how to delineate reliable sources of demand and supply in a point source-to-point source nutrient credit trading scheme, as well as how planners and decision makers can set reasonable discharge limits in future. Collapse Key Words Ammonia Ammonia permit limit Discharge permit compliance Regression trees Wastewater Collapse MESH Headings Collapse Grants Collapse
29	Quint: An R package for the identification of subgroups of clients who differ in which treatment alternative is best for them. Behav Res Methods 2017;48:650-63. [PMID: 26092391 PMCID: PMC4891398 DOI: 10.3758/s13428-015-0594-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Abstract In the analysis of randomized controlled trials (RCTs), treatment effect heterogeneity often occurs, implying differences across (subgroups of) clients in treatment efficacy. This phenomenon is typically referred to as treatment-subgroup interactions. The identification of subgroups of clients, defined in terms of pretreatment characteristics that are involved in a treatment-subgroup interaction, is a methodologically challenging task, especially when many characteristics are available that may interact with treatment and when no comprehensive a priori hypotheses on relevant subgroups are available. A special type of treatment-subgroup interaction occurs if the ranking of treatment alternatives in terms of efficacy differs across subgroups of clients (e.g., for one subgroup treatment A is better than B and for another subgroup treatment B is better than A). These are called qualitative treatment-subgroup interactions and are most important for optimal treatment assignment. The method QUINT (Qualitative INteraction Trees) was recently proposed to induce subgroups involved in such interactions from RCT data. The result of an analysis with QUINT is a binary tree from which treatment assignment criteria can be derived. The implementation of this method, the R package quint, is the topic of this paper. The analysis process is described step-by-step using data from the Breast Cancer Recovery Project, showing the reader all functions included in the package. The output is explained and given a substantive interpretation. Furthermore, an overview is given of the tuning parameters involved in the analysis, along with possible motivational concerns associated with choice alternatives that are available to the user. Collapse Key Words Computer software Moderator Regression trees Subgroup analysis Treatment efficacy Treatment-subgroup interaction Collapse MESH Headings Collapse Grants Collapse
30	Rainfall-induced fecal indicator organisms transport from manured fields: model sensitivity analysis. ENVIRONMENT INTERNATIONAL 2014;63:121-129. [PMID: 24291764 DOI: 10.1016/j.envint.2013.11.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Revised: 11/01/2013] [Accepted: 11/05/2013] [Indexed: 06/02/2023] Abstract Microbial quality of surface waters attracts attention due to food- and waterborne disease outbreaks. Fecal indicator organisms (FIOs) are commonly used for the microbial pollution level evaluation. Models predicting the fate and transport of FIOs are required to design and evaluate best management practices that reduce the microbial pollution in ecosystems and water sources and thus help to predict the risk of food and waterborne diseases. In this study we performed a sensitivity analysis for the KINEROS/STWIR model developed to predict the FIOs transport out of manured fields to other fields and water bodies in order to identify input variables that control the transport uncertainty. The distributions of model input parameters were set to encompass values found from three-year experiments at the USDA-ARS OPE3 experimental site in Beltsville and publicly available information. Sobol' indices and complementary regression trees were used to perform the global sensitivity analysis of the model and to explore the interactions between model input parameters on the proportion of FIO removed from fields. Regression trees provided a useful visualization of the differences in sensitivity of the model output in different parts of the input variable domain. Environmental controls such as soil saturation, rainfall duration and rainfall intensity had the largest influence in the model behavior, whereas soil and manure properties ranked lower. The field length had only moderate effect on the model output sensitivity to the model inputs. Among the manure-related properties the parameter determining the shape of the FIO release kinetic curve had the largest influence on the removal of FIOs from the fields. That underscored the need to better characterize the FIO release kinetics. Since the most sensitive model inputs are available in soil and weather databases or can be obtained using soil water models, results indicate the opportunity of obtaining large-scale estimates of FIO transport from fields based on publicly available rather than site-specific information. Collapse Key Words Fecal indicator organisms Global sensitivity analysis Regression trees Release and transport Collapse MESH Headings Environmental Monitoring Feces/microbiology Manure/microbiology Models, Statistical Rain Soil Microbiology Uncertainty Collapse Grants Collapse
31	Assessing large spatial scale landscape change effects on water quality and quantity response in the lower Athabasca River basin. INTEGRATED ENVIRONMENTAL ASSESSMENT AND MANAGEMENT 2013;9:392-404. [PMID: 22778001 DOI: 10.1002/ieam.1336] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2012] [Revised: 04/16/2012] [Accepted: 07/02/2012] [Indexed: 06/01/2023] Abstract Increased land use intensity has been shown to adversely affect aquatic ecosystems. Multiple landscape stressors interact over space and time, producing cumulative effects. Cumulative Effects Assessment (CEA) is the process of evaluating the impact a development project may have on the ecological surroundings, but several challenges exist that make current approaches to cumulative effects assessment ineffective. The main objective of this study was to compare results of different methods used to link landscape stressors with stream responses in a highly developed watershed, where past work has shown that the river has experienced significant water quality and quantity changes to improve approaches to CEA. The study site was the lower reaches of the Athabasca River, Canada that have been subjected to a diverse range of intense anthropogenic developments since the late 1960s. Linkages between landscape change and river response were evaluated using correlation analyses, stepwise, multiple regression, and regression trees. Notable landscape changes include increased industrial development and forest cut-blocks, made evident from satellite imagery and supporting ancillary data sets. Simple regression analyses showed water use was closely associated with total phosphorus (TP) and Na(+) concentrations, as well as specific conductance. The regression trees for total organic carbon (TOC), TP, and Na(+) showed that the landscape variables that appear as the first characteristic were the same variables that showed significant relations for their respective simple regression models. Simple, stepwise, and multiple regressions in conjunction with regression trees were useful in this study for capturing the strongest associations between landscape stressors and river response variables. The results highlight the need for improved scaling methods and monitoring strategies crucial to managing cumulative effects to river systems. Collapse Key Words Cumulative effects assessment Framework Regression trees Stream monitoring Collapse MESH Headings Alberta Environment Environmental Monitoring/methods Fresh Water/chemistry Models, Theoretical Regression Analysis Water Pollutants, Chemical/analysis Water Quality Collapse Grants Collapse