Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

34
(from Reference Citation Analysis)

Article PDFs (12)

Cited by > 0 (20)

Searched Name

Random forest (RF)

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Hennebelle A, Ismail L, Materwala H, Al Kaabi J, Ranjan P, Janardhanan R. Secure and privacy-preserving automated machine learning operations into end-to-end integrated IoT-edge-artificial intelligence-blockchain monitoring system for diabetes mellitus prediction. Comput Struct Biotechnol J 2024;23:212-233. [PMID: 38169966 PMCID: PMC10758733 DOI: 10.1016/j.csbj.2023.11.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 11/20/2023] [Accepted: 11/20/2023] [Indexed: 01/05/2024] Open

Moezzi SMM, Mohammadi M, Mohammadi M, Saloglu D, Sheikholeslami R. Machine learning insights into PM_2.5 changes during COVID-19 lockdown: LSTM and RF analysis in Mashhad. Environ Monit Assess 2024;196:453. [PMID: 38619639 DOI: 10.1007/s10661-024-12567-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Accepted: 03/23/2024] [Indexed: 04/16/2024]

Abstract

This study seeks to investigate the impact of COVID-19 lockdown measures on air quality in the city of Mashhad employing two strategies. We initiated our research using basic statistical methods such as paired sample t-tests to compare hourly PM2.5 data in two scenarios: before and during quarantine, and pre- and post-lockdown. This initial analysis provided a broad understanding of potential changes in air quality. Notably, a low reduction of 2.40% in PM2.5 was recorded when compared to air quality prior to the lockdown period. This finding highlights the wide range of factors that impact the levels of particulate matter in urban settings, with the transportation sector often being widely recognized as one of the principal causes of this issue. Nevertheless, throughout the period after the quarantine, a remarkable decrease in air quality was observed characterized by distinct seasonal patterns, in contrast to previous years. This finding demonstrates a significant correlation between changes in human mobility patterns and their influence on the air quality of urban areas. It also emphasizes the need to use air pollution modeling as a fundamental tool to evaluate and understand these linkages to support long-term plans for reducing air pollution. To obtain a more quantitative understanding, we then employed cutting-edge machine learning methods, such as random forest and long short-term memory algorithms, to accurately determine the effect of the lockdown on PM2.5 levels. Our models' results demonstrated remarkable efficacy in assessing the pollutant concentration in Mashhad during lockdown measures. The test set yielded an R-squared value of 0.82 for the long short-term memory network model, whereas the random forest model showed a calculated cross-validation R-squared of 0.78. The required computational cost for training the LSTM and the RF models across all data was 25 min and 3 s, respectively. In summary, through the integration of statistical methods and machine learning, this research attempts to provide a comprehensive understanding of the impact of human interventions on air quality dynamics.

Collapse

Gupta P, Shukla DP. Demi-decadal land use land cover change analysis of Mizoram, India, with topographic correction using machine learning algorithm. Environ Sci Pollut Res Int 2024:10.1007/s11356-024-33094-3. [PMID: 38609681 DOI: 10.1007/s11356-024-33094-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 03/22/2024] [Indexed: 04/14/2024]

Abstract

Mizoram (India) is part of UNESCO's biodiversity hotspots in India that is primarily populated by tribes who engage in shifting agriculture. Hence, the land use land cover (LULC) pattern of the state is frequently changing. We have used Landsat 5 and 8 satellite images to prepare LULC maps from 2000 to 2020 in every 5 years. The atmospherically corrected images were pre-processed for removal of cloud cover and then classified into six classes: waterbodies, farmland, settlement, open forest, dense forest, and bare land. We applied four machine learning (ML) algorithms for classification, namely, random forest (RF), classification and regression tree (CART), minimum distance (MD), and support vector machine (SVM) for the images from 2000 to 2020. With 80% training and 20% testing data, we found that the RF classifier works best with the most accuracy than other classifiers. The average overall accuracy (OA) and Kappa coefficient (KC) from 2000 to 2020 were 84.00% and 0.79 when the RF classifier was used. When using SVM, CART, and MD, the average OA and KC were 78.06%, 0.73; 78.60%, 0.72; and 73.32%, 0.65, respectively. We utilised three methods of topographic correction, namely, C-correction, SCS (sun canopy sensor) correction, and SCS + C correction to reduce the misclassification due to shadow effects. SCS + C correction worked best for this region; hence, we prepared LULC maps on SCS + C corrected satellite image. Hence, we have used RF classifier for LULC preparation demi-decadal from 2000 to 2020. The OA for 2000, 2005, 2010, 2015, and 2020 was found to be 84%, 81%, 81%, 85%, and 89%, respectively, using RF. The dense forest decreased from 2000 to 2020 with an increase in open forest, settlement, and agriculture; nevertheless, when Farmland was low, there was an increase in the barren land. The results were significantly improved with the topographic correction, and misclassification was quite less.

Collapse

Habib N, Saqib M, Najeh T, Gamil Y. Eco-Transformation of construction: Harnessing machine learning and SHAP for crumb rubber concrete sustainability. Heliyon 2024;10:e26927. [PMID: 38463877 PMCID: PMC10920364 DOI: 10.1016/j.heliyon.2024.e26927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 02/14/2024] [Accepted: 02/21/2024] [Indexed: 03/12/2024] Open

Abstract

Researchers have focused their efforts on investigating the integration of crumb rubber as a substitute for conventional aggregates and cement in concrete. Nevertheless, the manufacture of crumb rubber concrete (CRC) has been linked to the release of noxious pollutants, hence presenting potential environmental hazards. Rather than developing novel CRC formulations, the primary objective of this work is to construct an extensive database by leveraging prior research efforts. The study places particular emphasis on two crucial concrete properties: compressive strength (fc') and tensile strength (fts). The database includes a total of 456 data points for fc' and 358 data points for fts, focusing on nine essential characteristics that have a substantial impact on both attributes. The research employs several machine learning algorithms, including both individual and ensemble methods, to undertake a comprehensive analysis of the created databases for fc' and fts. In order to ascertain the correctness of the models, a comparative analysis of machine learning techniques, namely decision tree (DT) and random forest (RF), is conducted using statistical evaluation. Cross-validation approaches are used in order to address the possible issues of overfitting. Furthermore, the Shapley additive explanations (SHAP) approach is used to investigate the influence of input parameters and their interrelationships. The findings demonstrate that the RF methodology has superior performance compared to other ensemble techniques, as shown by its lower error rates and higher coefficient of determination (R2) of 0.87 and 0.85 for fc' and fts respectively. When comparing ensemble approaches, it can be seen that AdaBoost outperforms bagging by 6 % for both outcome models and individual decision tree learners by 17% and 21% for fc' and fts respectively in terms of performance. The average accuracy of AdaBoost algorithm for both the models is 84%. Significantly, the age and the inclusion of crumb rubber in CRC are identified as the primary criteria that have a substantial influence on the mechanical properties of this particular kind of concrete.

Collapse

Xing Y, Jin Y, Liu Y. Construction and comparison of short-term prognosis prediction model based on machine learning in acute ischemic stroke. Heliyon 2024;10:e24232. [PMID: 38234895 PMCID: PMC10792580 DOI: 10.1016/j.heliyon.2024.e24232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 11/25/2023] [Accepted: 01/04/2024] [Indexed: 01/19/2024] Open

Abstract

Objective

To construct and compared the short-term prognosis prediction models of acute ischemic stroke (AIS) by machine learning (ML).

Methods

Retrospectively study. The group W (mRS≤3) was clustered, and combined with group P (mRS>3) to form the post-clustering dataset for modeling. The "glmnet", "rpart", "xgboost", "randomForest", "neuralnet" packages were used to construct ML models. The accuracy, sensitivity, specificity, positive predict value (PPV), negative predict value (NPV) among the models were compared. Four external clinical datasets were used for external clinical validation. The optimal prediction model was determined by variable screening ability, model visualization, and external clinical validation performance.

Results

The post-clustering dataset contains 139 patients (group W) and 122 patients (group P). The neutrophil multiplied by D-dimer (NDM) has predictive value in all ML prediction models in this study. In the decision tree model, NDMQ occupies the first tree node, When NDM≤5.62 and the age<74.5, the probability of poor prognosis of AIS is less than 20 %. When NDM>5.62 and accompanied by pneumonia, the incidence of poor prognosis of AIS is about 90 %. In the Random Forest (RF) model, NDMQ had the highest Gini index. The variable combination screened by the RF model had the best performance in the neural network, and the accuracy, sensitivity, specificity, PPV, and NPV of the external validation were 0.800, 0.774, 0.833, 0.857, and 0.741, respectively. The RF model had the best performance in the external clinical validation datasets, with accuracies of 0.646, 0.697, 0.695, and 0.713, respectively.

Conclusions

NDM shows predictive value for AIS short-term prognosis in all ML models in this study. The optimal model in screening characteristic variables and the performance of in external clinical datasets was RF model. In the analysis of medical data with small sample size and outcome as categorical variables, RF could be used as the main algorithm to build a model.

Collapse

Wang Y, Shi F, Yao P, Sheng Y, Zhao C. Assessing the evolution and attribution of watershed resilience in arid inland river basins, Northwest China. Sci Total Environ 2024;906:167534. [PMID: 37797763 DOI: 10.1016/j.scitotenv.2023.167534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/28/2023] [Accepted: 09/30/2023] [Indexed: 10/07/2023]

Abstract

Water scarcity significantly limits the sustainable development of oasis economies in arid inland river basins. Quantifying watershed resilience and its drivers is a major focus in the fields of hydrology and water resources. In this study, the resilience indicator pi represents watershed resilience, while meteorological, hydrological, socioeconomic, and ecological factors are used to investigate the spatial and temporal patterns of resilience and important driving factors in the Hotan River Basin from 1958 to 2020 by combining principal component analysis and random forest model. Results show that the overall resilience of the Hotan River Basin is low, decreasing from the upper (upstream) to the middle and lower (downstream) reaches, and that the intensity of human activities has a negative impact on resilience. Rivers are more likely to reach maximum resilience after experiencing periods of wet and dry conditions, although there is a lag in this progress. The random forest machine learning algorithm was used to accurately predict the resilience levels of the two upstream tributaries Yurungkash and Karakash Rivers, and the downstream Hotan River, with classification accuracies of 84.2 %, 71.4 %, and 87 %, respectively. The factors affecting the resilience of the Yurungkash River are the 30-day maximum, base flow index, low pulse duration, median streamflow in May, median streamflow in August, median streamflow in October, and 7-day maximum. The set of factors used to classify the resilience of the Karakash River include the 7-day maximum, 1-day maximum, median streamflow in June, 30-day maximum, 3-day maximum, median streamflow in February, and autumn temperature. The factors affecting the resilience of the Hotan River are the watershed inflow, Xiaota station runoff, population growth rate, and effective irrigated area. The findings of this study provide a theoretical basis for integrated water resource management and the sustainable development of the oasis economy in the Hotan River Basin.

Collapse

SHEN JUAN, ZHANG WEIYU, JIN QINQIN, GONG FUYU, ZHANG HEPING, XU HONGLIANG, LI JIEJIE, YAO HUI, JIANG XIYA, YANG YINTING, HONG LIN, MEI JIE, SONG YANG, ZHOU SHUGUANG. Polo-like kinase 1 as a biomarker predicts the prognosis and immunotherapy of breast invasive carcinoma patients. Oncol Res 2023;32:339-351. [PMID: 38186570 PMCID: PMC10765123 DOI: 10.32604/or.2023.030887] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 08/03/2023] [Indexed: 01/09/2024] Open

Abstract

Background

Invasive breast carcinoma (BRCA) is associated with poor prognosis and high risk of mortality. Therefore, it is critical to identify novel biomarkers for the prognostic assessment of BRCA.

Methods

The expression data of polo-like kinase 1 (PLK1) in BRCA and the corresponding clinical information were extracted from TCGA and GEO databases. PLK1 expression was validated in diverse breast cancer cell lines by quantitative real-time polymerase chain reaction (qRT-PCR) and western blotting. Single sample gene set enrichment analysis (ssGSEA) was performed to evaluate immune infiltration in the BRCA microenvironment, and the random forest (RF) and support vector machine (SVM) algorithms were used to screen for the hub infiltrating cells and calculate the immunophenoscore (IPS). The RF algorithm and COX regression model were applied to calculate survival risk scores based on the PLK1 expression and immune cell infiltration. Finally, a prognostic nomogram was constructed with the risk score and pathological stage, and its clinical potential was evaluated by plotting calibration charts and DCA curves. The application of the nomogram was further validated in an immunotherapy cohort.

Results

PLK1 expression was significantly higher in the tumor samples in TCGA-BRCA cohort. Furthermore, PLK1 expression level, age and stage were identified as independent prognostic factors of BRCA. While the IPS was unaffected by PLK1 expression, the TMB and MATH scores were higher in the PLK1-high group, and the TIDE scores were higher for the PLK1-low patients. We also identified 6 immune cell types with high infiltration, along with 11 immune cell types with low infiltration in the PLK1-high tumors. A risk score was devised using PLK1 expression and hub immune cells, which predicted the prognosis of BRCA patients. In addition, a nomogram was constructed based on the risk score and pathological staging, and showed good predictive performance.

Conclusions

PLK1 expression and immune cell infiltration can predict post-immunotherapy prognosis of BRCA patients.

Collapse

Affiliation(s)

JUAN SHEN School of Big Data and Artificial Intelligence, Anhui Xinhua University, Hefei, 230088, China
WEIYU ZHANG Department of Gynecology and Obstetrics, Maternity and Child Healthcare Hospital Affiliated to Anhui Medical University, Anhui Province Maternity and Child Healthcare Hospital, Hefei, 230001, China Department of Gynecology and Obstetrics, The Fifth Clinical College of Anhui Medical University, Hefei, 230032, China
QINQIN JIN Department of Gynecology and Obstetrics, Maternity and Child Healthcare Hospital Affiliated to Anhui Medical University, Anhui Province Maternity and Child Healthcare Hospital, Hefei, 230001, China Department of Gynecology and Obstetrics, The Fifth Clinical College of Anhui Medical University, Hefei, 230032, China
FUYU GONG Departments of Breast Surgery, Fuyang Women and Children’s Hospital, Fuyang, 236000, China
HEPING ZHANG Departments of Pathology, Anhui Province Maternity and Child Health Hospital, Hefei, 230001, China
HONGLIANG XU Departments of Pathology, Anhui Province Maternity and Child Health Hospital, Hefei, 230001, China
JIEJIE LI Department of Gynecology and Obstetrics, Maternity and Child Healthcare Hospital Affiliated to Anhui Medical University, Anhui Province Maternity and Child Healthcare Hospital, Hefei, 230001, China Department of Gynecology and Obstetrics, The Fifth Clinical College of Anhui Medical University, Hefei, 230032, China
HUI YAO Department of Gynecology and Obstetrics, Maternity and Child Healthcare Hospital Affiliated to Anhui Medical University, Anhui Province Maternity and Child Healthcare Hospital, Hefei, 230001, China Department of Gynecology and Obstetrics, The Fifth Clinical College of Anhui Medical University, Hefei, 230032, China
XIYA JIANG Department of Gynecology and Obstetrics, Maternity and Child Healthcare Hospital Affiliated to Anhui Medical University, Anhui Province Maternity and Child Healthcare Hospital, Hefei, 230001, China Department of Gynecology and Obstetrics, The Fifth Clinical College of Anhui Medical University, Hefei, 230032, China
YINTING YANG Department of Gynecology and Obstetrics, Maternity and Child Healthcare Hospital Affiliated to Anhui Medical University, Anhui Province Maternity and Child Healthcare Hospital, Hefei, 230001, China Department of Gynecology and Obstetrics, The Fifth Clinical College of Anhui Medical University, Hefei, 230032, China
LIN HONG Department of Gynecology and Obstetrics, Maternity and Child Healthcare Hospital Affiliated to Anhui Medical University, Anhui Province Maternity and Child Healthcare Hospital, Hefei, 230001, China Department of Gynecology and Obstetrics, The Fifth Clinical College of Anhui Medical University, Hefei, 230032, China
JIE MEI Department of Gynecology and Obstetrics, Maternity and Child Healthcare Hospital Affiliated to Anhui Medical University, Anhui Province Maternity and Child Healthcare Hospital, Hefei, 230001, China Department of Gynecology and Obstetrics, The Fifth Clinical College of Anhui Medical University, Hefei, 230032, China
YANG SONG Department of Pain, The First Affiliated Hospital of Anhui Medical University, Hefei, 230032, China
SHUGUANG ZHOU Department of Gynecology and Obstetrics, Maternity and Child Healthcare Hospital Affiliated to Anhui Medical University, Anhui Province Maternity and Child Healthcare Hospital, Hefei, 230001, China Department of Gynecology and Obstetrics, The Fifth Clinical College of Anhui Medical University, Hefei, 230032, China Department of Gynecology and Obstetrics, Linquan Maternity and Child Healthcare Hospital, Fuyang, 236400, China

Collapse

Inqiad WB, Siddique MS, Alarifi SS, Butt MJ, Najeh T, Gamil Y. Comparative analysis of various machine learning algorithms to predict 28-day compressive strength of Self-compacting concrete. Heliyon 2023;9:e22036. [PMID: 38045144 PMCID: PMC10692774 DOI: 10.1016/j.heliyon.2023.e22036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 11/02/2023] [Accepted: 11/02/2023] [Indexed: 12/05/2023] Open

Jiang Z, Yang S, Luo S. Source analysis and health risk assessment of heavy metals in agricultural land of multi-mineral mining and smelting area in the Karst region - a case study of Jichangpo Town, Southwest China. Heliyon 2023;9:e17246. [PMID: 37456041 PMCID: PMC10338313 DOI: 10.1016/j.heliyon.2023.e17246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 06/10/2023] [Accepted: 06/12/2023] [Indexed: 07/18/2023] Open

Abstract

In the Karst region of Southwest China, the content of soil heavy metals is generally high because of the geological background. Moreover, Southwest China is rich in mineral resources. A large number of mining and smelting activities discharge heavy metals into surrounding soil and cause superimposed pollution, which has drawn widespread concern. Due to the large variation coefficients of soil heavy metals in the Karst region, it is particularly essential to select appropriate analysis methods. In this paper, Jichangpo in Puding County, a Karst area with multi-mineral mining and smelting, is selected as the research object. A total of 368 pieces of agricultural topsoil in the study area are collected. The pollution level of heavy metals in agricultural soil is evaluated by the geological accumulation index (I_geo) and enrichment factor (EF). Absolute Factor Score/Multiple Linear Regression (APCS/MLR), geographic information system (GIS), self-organizing mapping (SOM), and random forest (RF) are used for the source allocation of soil heavy metals. Finally, the combination of APCS/MLR and health risk assessment model is adopted to evaluate the risks of heavy metal sources and determine the priority-control source. The results show that the average values of soil heavy metals in the study area (Cd, Hg, As, Pb, Cr, Cu, Zn, and Ni) exceed the background values of corresponding elements in Guizhou Province. Three sources of heavy metals are identified by combining APCS/MLR, GIS, SOM, and RF. Zn (63.47%), Pb (55.77%), Cd (58.98%), Hg (32.17%), Cu (14.41%), and As (5.99%) are related to lead-zinc mining and smelting; Cr (98.14%), Ni (90.64%), Cu (76.93%), Pb (43.02%), Zn (35.22%), Cd (28.97%), Hg (22.44%), and As (5.84%) are mixed sources (natural and agricultural sources); As (88.17%), Hg (45.39%), Cd (12.04%), Cu (8.66%), and Ni (6.72%) are related to the mining and smelting of coal and iron. The results of health risk assessment show that only As poses a non-carcinogenic risk to human health. 3.31% of the sampling points of As have non-carcinogenic risks to adults and 10.22% to children. In terms of carcinogenic risks, As, Pb, and Cr pose carcinogenic risks to adults and children. Combined with APCS/MLR and the health risk assessment model, the mining and smelting of coal and iron is the priority-control pollution source. This paper provides a comprehensive method for studying the distribution of heavy metal sources in areas with large variation coefficients of soil heavy metals in the Karst region. Furthermore, it offers a theoretical basis for the management and assessment of heavy metal pollution in agricultural land in the study area, which is helpful for researchers to make strategic decisions on food security when selecting agricultural land.

Collapse

Wang Z, Wang J, Yu D, Chen K. The potential evaluation of groundwater by integrating rank sum ratio (RSR) and machine learning algorithms in the Qaidam Basin. Environ Sci Pollut Res Int 2023;30:63991-64005. [PMID: 37059956 DOI: 10.1007/s11356-023-26961-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 04/08/2023] [Indexed: 04/16/2023]

Abstract

Groundwater is a vital resource in arid areas that sustains local industrial development and environmental preservation. Mapping groundwater potential zones and determining high-potential regions are essential for the responsible use of the local groundwater resource. When utilizing machine learning or deep learning algorithms to forecast groundwater potential in arid areas, difficulties such as inaccurate and overfitting predictions might occur due to a shortage of borehole samples. In this study, a database of groundwater conditioning factors with a size of 275,157 × 9 was created in the Qaidam Basin, and 85 known borehole samples were collected. The groundwater potential was evaluated using a combination of rank sum ratio (RSR), projection pursuit regression (PPR) and random forest (RF) algorithms, resulting in four models: PPR, RSR-PPR, RSR-RF, and RF. Results indicated that the groundwater potential was higher in mountainous regions surrounding the Qaidam Basin and decreased progressively towards the central and northwestern regions where most industries and facilities are located. The two primary factors, according to the PPR and RF models, were evapotranspiration (0.246, 0.225) and landform (0.176, 0.294). In terms of their ability to accurately forecast the borehole samples, the four models ranked as follows: RF > RSR-RF > RSR-PPR > PPR. The accuracy of the four models in the low-potential area was 0.73 (PPR), 0.60 (RSR-PPR), 0.87 (RSR-RF), and 0.80 (RF), respectively. However, the RF model showed overfitting due to a lack of samples, especially in high-potential regions, which limits its applicability. The RSR-RF method was applied directly to evaluate the entire factor database, avoiding the risk of overfitting caused by a limited number of training samples. The results demonstrate that the RSR-RF model is effective for classifying groundwater potential types in samples and mapping groundwater potential of the study area. This research presents a novel approach for groundwater potential predictions in areas with insufficient sample sizes, providing a reference for policymakers and researchers.

Collapse

Shi W, Wu W, Zhang L, Jia Q, Tan J, Zheng W, Li N, Xu K, Meng Z. Prognosis of thyroid carcinoma patients with osseous metastases: an SEER-based study with machine learning. Ann Nucl Med 2023. [PMID: 36867400 DOI: 10.1007/s12149-023-01826-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 02/09/2023] [Indexed: 03/04/2023]

Elbeltagi A, Pande CB, Kumar M, Tolche AD, Singh SK, Kumar A, Vishwakarma DK. Prediction of meteorological drought and standardized precipitation index based on the random forest (RF), random tree (RT), and Gaussian process regression (GPR) models. Environ Sci Pollut Res Int 2023;30:43183-43202. [PMID: 36648725 DOI: 10.1007/s11356-023-25221-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2022] [Accepted: 01/05/2023] [Indexed: 06/17/2023]

Abstract

Agriculture, meteorological, and hydrological drought is a natural hazard which affects ecosystems in the central India of Maharashtra state. Due to limited historical data for drought monitoring and forecasting available in the central India of Maharashtra state, implementing machine learning (ML) algorithms could allow for the prediction of future drought events. In this paper, we have focused on the prediction accuracy of meteorological drought in the semi-arid region based on the standardized precipitation index (SPI) using the random forest (RF), random tree (RT), and Gaussian process regression (GPR-PUK kernel) models. A different combination of machine learning models and variables has been performed for the forecasting of metrological drought based on the SPI-6 and 12 months. Models were developed using monthly rainfall data for the period of 2000-2019 at two meteorological stations, namely, Karanjali and Gangawdi, each representing a geographical region of Upper Godavari river basin area in the central India of Maharashtra state which frequently experiences droughts. Historical data from the SPI from 2000 to 2013 was processed to train the model into machine learning model, and the rest of the 2014 to 2019-year data were used for testing to forecast the SPI and metrological drought. The mean square error (MSE), root mean square error (RMSE), adjusted R², Mallows' (Cp), Akaike's (AIC), Schwarz's (SBC), and Amemiya's PC were used to identify the best combination input model and best subregression analysis for both stations of SPI-6 and 12. The correlation coefficient ([Formula: see text]), mean absolute error (MAE), root mean square error (RMSE), relative absolute error (RAE), and root relative squared error (RRSE) were used to perform evaluation for SPI-6 and 12 months of both stations with RF, RT, and GPR-PUK kernel models during the training and testing scenarios. The results during testing phase revealed that the RF was found as the best model in forecasting droughts with values of [Formula: see text], MAE, RMSE, RAE (%), and RRSE (%) being 0.856, 0.551, 0.718, 74.778, and 54.019, respectively, for SPI-6 while 0.961, 0.361, 0.538, 34.926, and 28.262, respectively, for SPI-12 scales at Gangawdi station. Further, the respective values of evaluators at Karanjali station were 0.913 and 0.966, 0.541 and 0.386, 0.604 and 0.589, 52.592 and 36.959, and 42.315 and 31.394 for PUK kernel and RT models, respectively, during SPI-6 and SPI-12. Machine learning models are potential drought warning techniques because they take less time, have fewer inputs, and are less sophisticated than dynamic or scientific models.

Collapse

Pourhashemi S, Asadi MAZ, Boroughani M, Azadi H. Mapping of dust source susceptibility by remote sensing and machine learning techniques (case study: Iran-Iraq border). Environ Sci Pollut Res Int 2023;30:27965-27979. [PMID: 36394809 DOI: 10.1007/s11356-022-23982-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/12/2022] [Accepted: 10/30/2022] [Indexed: 06/16/2023]

Ampadi Ramachandran R, Chi SW, Srinivasa Pai P, Foucher K, Ozevin D, Mathew MT. Artificial intelligence and machine learning as a viable solution for hip implant failure diagnosis-Review of literature and in vitro case study. Med Biol Eng Comput 2023. [PMID: 36701013 DOI: 10.1007/s11517-023-02779-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 01/09/2023] [Indexed: 01/27/2023]

Shi T, Zhang J, Shen W, Wang J, Li X. Machine learning can identify the sources of heavy metals in agricultural soil: A case study in northern Guangdong Province, China. Ecotoxicol Environ Saf 2022;245:114107. [PMID: 36152430 DOI: 10.1016/j.ecoenv.2022.114107] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/06/2022] [Accepted: 09/19/2022] [Indexed: 06/16/2023]

Chengaiyan S, Anandan K. Effect of functional and effective brain connectivity in identifying vowels from articulation imagery procedures. Cogn Process 2022. [PMID: 35794496 DOI: 10.1007/s10339-022-01103-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2020] [Accepted: 06/15/2022] [Indexed: 11/03/2022]

Lee J, Lee S, Street WN, Polgreen LA. Machine learning approaches to predict the 1-year-after-initial-AMI survival of elderly patients. BMC Med Inform Decis Mak 2022;22:115. [PMID: 35488291 PMCID: PMC9052482 DOI: 10.1186/s12911-022-01854-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 04/11/2022] [Indexed: 12/15/2022] Open

Abstract

BACKGROUND

While multiple randomized controlled trials (RCTs) are available, their results may not be generalizable to older, unhealthier or less-adherent patients. Observational data can be used to predict outcomes and evaluate treatments; however, exactly which strategy should be used to analyze the outcomes of treatment using observational data is currently unclear. This study aimed to determine the most accurate machine learning technique to predict 1-year-after-initial-acute-myocardial-infarction (AMI) survival of elderly patients and to identify the association of angiotensin-converting- enzyme inhibitors and angiotensin-receptor blockers (ACEi/ARBs) with survival.

METHODS

We built a cohort of 124,031 Medicare beneficiaries who experienced an AMI in 2007 or 2008. For analytical purposes, all variables were categorized into nine different groups: ACEi/ARB use, demographics, cardiac events, comorbidities, complications, procedures, medications, insurance, and healthcare utilization. Our outcome of interest was 1-year-post-AMI survival. To solve this classification task, we used lasso logistic regression (LLR) and random forest (RF), and compared their performance depending on category selection, sampling methods, and hyper-parameter selection. Nested 10-fold cross-validation was implemented to obtain an unbiased estimate of performance evaluation. We used the area under the receiver operating curve (AUC) as our primary measure for evaluating the performance of predictive algorithms.

RESULTS

LLR consistently showed best AUC results throughout the experiments, closely followed by RF. The best prediction was yielded with LLR based on the combination of demographics, comorbidities, procedures, and utilization. The coefficients from the final LLR model showed that AMI patients with many comorbidities, older ages, or living in a low-income area have a higher risk of mortality 1-year after an AMI. In addition, treating the AMI patients with ACEi/ARBs increases the 1-year-after-initial-AMI survival rate of the patients.

CONCLUSIONS

Given the many features we examined, ACEi/ARBs were associated with increased 1-year survival among elderly patients after an AMI. We found LLR to be the best-performing model over RF to predict 1-year survival after an AMI. LLR greatly improved the generalization of the model by feature selection, which implicitly indicates the association between AMI-related variables and survival can be defined by a relatively simple model with a small number of features. Some comorbidities were associated with a greater risk of mortality, such as heart failure and chronic kidney disease, but others were associated with survival such as hypertension, hyperlipidemia, and diabetes. In addition, patients who live in urban areas and areas with large numbers of immigrants have a higher probability of survival. Machine learning methods are helpful to determine outcomes when RCT results are not available.

Collapse

Wang X, Zhang C, Wang C, Liu G, Wang H. GIS-based for prediction and prevention of environmental geological disaster susceptibility: From a perspective of sustainable development. Ecotoxicol Environ Saf 2021;226:112881. [PMID: 34634737 DOI: 10.1016/j.ecoenv.2021.112881] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Revised: 09/21/2021] [Accepted: 10/06/2021] [Indexed: 06/13/2023]

Alghamdi W, Alzahrani E, Ullah MZ, Khan YD. 4mC-RF: Improving the prediction of 4mC sites using composition and position relative features and statistical moment. Anal Biochem 2021;633:114385. [PMID: 34571005 DOI: 10.1016/j.ab.2021.114385] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 09/09/2021] [Accepted: 09/13/2021] [Indexed: 01/28/2023]

Aher RB, Sarkar D. 2D-QSAR modeling and two-fold classification of 1,2,4-triazole derivatives for antitubercular potency against the dormant stage of Mycobacterium tuberculosis. Mol Divers 2021. [PMID: 34347229 DOI: 10.1007/s11030-021-10254-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 06/14/2021] [Indexed: 10/20/2022]

Abstract

The dormant or latent form of Mycobacterium tuberculosis (MTB) is not killed by the conventional antitubercular drugs. The treatment of latent TB is essential to reduce the period of treatment as well as incidences of drug resistance. In this background, we have made an attempt to develop the quantitative structure-activity relationship models (QSAR: regression and classification based) against the dormant form of MTB and later used the developed classifier models (linear discriminant analysis (LDA) and random forest (RF)) for the two-fold classifications. The logic of applying this concept of two-fold classification for the MTB modeling is to increase the confidence of correct classification. The 2D-QSAR modeling suggested the contribution of burden eigen, edge adjacency, van der Waals (vdW) surface area, topological charge, and pharmacophoric indices in predicting the antitubercular activity against the dormant MTB. The prediction qualities of the training and test sets were found to be moderate and good, according to the mean absolute error (MAE)-based criteria's. The LDA and RF models unveiled the importance of burden eigen, edge adjacency, Geary autocorrelation, and drug-like indices as discriminating features to differentiate the antitubercular compounds into higher and lower active groups. The LDA model showed the classification accuracies of 85.14% and 87.10% for the training and test sets, while the RF model exhibited the accuracies of 100.00% and 80.65% for both the sets. The descriptors selected in the final models are only two-dimensional (2D), which are easy to compute and does not require computationally expensive steps of structure conversion, optimization, and energy minimization mandatorily needed before the computation of 3D descriptors. These models could be used for identifying and selection of higher active compounds against the dormant form of the MTB.

Collapse

Quddus A, Shahidi Zandi A, Prest L, Comeau FJE. Using long short term memory and convolutional neural networks for driver drowsiness detection. Accid Anal Prev 2021;156:106107. [PMID: 33848710 DOI: 10.1016/j.aap.2021.106107] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 07/19/2020] [Accepted: 03/27/2021] [Indexed: 06/12/2023]

Abstract

Fatigue negatively affects the safety and performance of drivers on the road. In fact, drowsiness and fatigue are the cause of a substantial number of motor vehicle accidents. Drowsiness among the drivers can be detected using variety of modalities, including electroencephalogram (EEG), eye movement, and vehicle driving dynamics. Among these EEG is highly accurate but very intrusive and cumbersome. On the other hand, vehicle driving dynamics are very easy to acquire but accuracy is not very high. Eye movement based approach is very attractive in terms of balance between these two extremes. However, eye movement based techniques normally require an eye tracking device which consists of high speed camera with sophisticated algorithm to extract eye movement related parameters such as blinking, eye closure, saccades, fixation etc. This makes eye tracking based drowsiness detection difficult to implement as a practical system, especially on an embedded platform. In this paper, authors propose to use eye images from camera directly without the need for expensive eye-tracking system. Here, eye related movements are captured by Recurrent Neural Network (RNN) to detect the drowsiness. Long Short Term Memory (LSTM) is a class of RNN which has several advantages over vanilla RNNs. In this work an array of LSTM cells are utilized to model the eye movements. Two types of LSTMs were employed: 1-D LSTM (R-LSTM) which is used as baseline and the convolutional LSTM (C-LSTM) which facilitates using 2-D images directly. Patches of size 48 × 48 around each eye were extracted from 38 subjects, participating in a simulated driving experiment. The state of vigilance among the subjects were independently assessed by power spectral analysis of multichannel electroencephalogram (EEG) signals, recorded simultaneously, and binary labels of alert and drowsy (baseline) were generated. Results show high efficacy of the proposed system. R-LSTM based approach resulted in accuracy around 82 % and C-LSTM based approach resulted in accuracy in the range of 95%-97%. Comparison is also provided with a recently published eye-tracking based approach, showing the proposed LSTM technique outperform with a wide margin.

Collapse

Sandhu H, Kumar RN, Garg P. Machine learning-based modeling to predict inhibitors of acetylcholinesterase. Mol Divers 2021;26:331-340. [PMID: 33891263 DOI: 10.1007/s11030-021-10223-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 04/02/2021] [Indexed: 12/29/2022]

Islam ARMT, Hasanuzzaman M, Shammi M, Salam R, Bodrud-Doza M, Rahman MM, Mannan MA, Huq S. Are meteorological factors enhancing COVID-19 transmission in Bangladesh? Novel findings from a compound Poisson generalized linear modeling approach. Environ Sci Pollut Res Int 2021;28:11245-11258. [PMID: 33118070 PMCID: PMC7594949 DOI: 10.1007/s11356-020-11273-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Accepted: 10/15/2020] [Indexed: 05/06/2023]

Wang H, Qin Z, Yan A. Classification models and SAR analysis on CysLT1 receptor antagonists using machine learning algorithms. Mol Divers 2021;25:1597-1616. [PMID: 33534023 DOI: 10.1007/s11030-020-10165-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Accepted: 11/27/2020] [Indexed: 12/21/2022]

Kwarteng EVS, Andam-Akorful SA, Kwarteng A, Asare DCB, Quaye-Ballard JA, Osei FB, Duker AA. Spatial variation in lymphatic filariasis risk factors of hotspot zones in Ghana. BMC Public Health 2021;21:230. [PMID: 33509140 PMCID: PMC7841995 DOI: 10.1186/s12889-021-10234-9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2020] [Accepted: 01/13/2021] [Indexed: 12/29/2022] Open

Abstract

BACKGROUND

Lymphatic Filariasis (LF), a parasitic nematode infection, poses a huge economic burden to affected countries. LF endemicity is localized and its prevalence is spatially heterogeneous. In Ghana, there exists differences in LF prevalence and multiplicity of symptoms in the country's northern and southern parts. Species distribution models (SDMs) have been utilized to explore the suite of risk factors that influence the transmission of LF in these geographically distinct regions.

METHODS

Presence-absence records of microfilaria (mf) cases were stratified into northern and southern zones and used to run SDMs, while climate, socioeconomic, and land cover variables provided explanatory information. Generalized Linear Model (GLM), Generalized Boosted Model (GBM), Artificial Neural Network (ANN), Surface Range Envelope (SRE), Multivariate Adaptive Regression Splines (MARS), and Random Forests (RF) algorithms were run for both study zones and also for the entire country for comparison.

RESULTS

Best model quality was obtained with RF and GBM algorithms with the highest Area under the Curve (AUC) of 0.98 and 0.95, respectively. The models predicted high suitable environments for LF transmission in the short grass savanna (northern) and coastal (southern) areas of Ghana. Mainly, land cover and socioeconomic variables such as proximity to inland water bodies and population density uniquely influenced LF transmission in the south. At the same time, poor housing was a distinctive risk factor in the north. Precipitation, temperature, slope, and poverty were common risk factors but with subtle variations in response values, which were confirmed by the countrywide model.

CONCLUSIONS

This study has demonstrated that different variable combinations influence the occurrence of lymphatic filariasis in northern and southern Ghana. Thus, an understanding of the geographic distinctness in risk factors is required to inform on the development of area-specific transmission control systems towards LF elimination in Ghana and internationally.

Collapse

Idakwo G, Thangapandian S, Luttrell J, Li Y, Wang N, Zhou Z, Hong H, Yang B, Zhang C, Gong P. Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets. J Cheminform 2020;12:66. [PMID: 33372637 PMCID: PMC7592558 DOI: 10.1186/s13321-020-00468-x] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 10/13/2020] [Indexed: 12/14/2022] Open

Abstract

The specificity of toxicant-target biomolecule interactions lends to the very imbalanced nature of many toxicity datasets, causing poor performance in Structure–Activity Relationship (SAR)-based chemical classification. Undersampling and oversampling are representative techniques for handling such an imbalance challenge. However, removing inactive chemical compound instances from the majority class using an undersampling technique can result in information loss, whereas increasing active toxicant instances in the minority class by interpolation tends to introduce artificial minority instances that often cross into the majority class space, giving rise to class overlapping and a higher false prediction rate. In this study, in order to improve the prediction accuracy of imbalanced learning, we employed SMOTEENN, a combination of Synthetic Minority Over-sampling Technique (SMOTE) and Edited Nearest Neighbor (ENN) algorithms, to oversample the minority class by creating synthetic samples, followed by cleaning the mislabeled instances. We chose the highly imbalanced Tox21 dataset, which consisted of 12 in vitro bioassays for > 10,000 chemicals that were distributed unevenly between binary classes. With Random Forest (RF) as the base classifier and bagging as the ensemble strategy, we applied four hybrid learning methods, i.e., RF without imbalance handling (RF), RF with Random Undersampling (RUS), RF with SMOTE (SMO), and RF with SMOTEENN (SMN). The performance of the four learning methods was compared using nine evaluation metrics, among which F₁ score, Matthews correlation coefficient and Brier score provided a more consistent assessment of the overall performance across the 12 datasets. The Friedman’s aligned ranks test and the subsequent Bergmann-Hommel post hoc test showed that SMN significantly outperformed the other three methods. We also found that a strong negative correlation existed between the prediction accuracy and the imbalance ratio (IR), which is defined as the number of inactive compounds divided by the number of active compounds. SMN became less effective when IR exceeded a certain threshold (e.g., > 28). The ability to separate the few active compounds from the vast amounts of inactive ones is of great importance in computational toxicology. This work demonstrates that the performance of SAR-based, imbalanced chemical toxicity classification can be significantly improved through the use of data rebalancing.

Collapse

Zhao SS, Feng XL, Hu YC, Han Y, Tian Q, Sun YZ, Zhang J, Ge XW, Cheng SC, Li XL, Mao L, Shen SN, Yan LF, Cui GB, Wang W. Better efficacy in differentiating WHO grade II from III oligodendrogliomas with machine-learning than radiologist's reading from conventional T1 contrast-enhanced and fluid attenuated inversion recovery images. BMC Neurol 2020;20:48. [PMID: 32033580 PMCID: PMC7007642 DOI: 10.1186/s12883-020-1613-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 01/13/2020] [Indexed: 12/13/2022] Open

Affiliation(s)

Sha-Sha Zhao Department of Radiology & Functional and Molecular Imaging Key Lab of Shaanxi Province, Tangdu Hospital, Air Force Medical University, 569 Xinsi Road, Xi'an, 710038, Shaanxi, People's Republic of China
Xiu-Long Feng Department of Radiology & Functional and Molecular Imaging Key Lab of Shaanxi Province, Tangdu Hospital, Air Force Medical University, 569 Xinsi Road, Xi'an, 710038, Shaanxi, People's Republic of China
Yu-Chuan Hu Department of Radiology & Functional and Molecular Imaging Key Lab of Shaanxi Province, Tangdu Hospital, Air Force Medical University, 569 Xinsi Road, Xi'an, 710038, Shaanxi, People's Republic of China
Yu Han Department of Radiology & Functional and Molecular Imaging Key Lab of Shaanxi Province, Tangdu Hospital, Air Force Medical University, 569 Xinsi Road, Xi'an, 710038, Shaanxi, People's Republic of China
Qiang Tian Department of Radiology & Functional and Molecular Imaging Key Lab of Shaanxi Province, Tangdu Hospital, Air Force Medical University, 569 Xinsi Road, Xi'an, 710038, Shaanxi, People's Republic of China
Ying-Zhi Sun Department of Radiology & Functional and Molecular Imaging Key Lab of Shaanxi Province, Tangdu Hospital, Air Force Medical University, 569 Xinsi Road, Xi'an, 710038, Shaanxi, People's Republic of China
Jie Zhang Department of Radiology & Functional and Molecular Imaging Key Lab of Shaanxi Province, Tangdu Hospital, Air Force Medical University, 569 Xinsi Road, Xi'an, 710038, Shaanxi, People's Republic of China
Xiang-Wei Ge Student Brigade, Air Force Medical University, Xi'an, 710032, Shaanxi, China
Si-Chao Cheng Student Brigade, Air Force Medical University, Xi'an, 710032, Shaanxi, China
Xiu-Li Li Deepwise AI Lab, Deepwise Inc, No.8 Haidian avenue, Sinosteel International Plaza, Beijing, 100080, China
Li Mao Deepwise AI Lab, Deepwise Inc, No.8 Haidian avenue, Sinosteel International Plaza, Beijing, 100080, China
Shu-Ning Shen Department of Stomatology, PLA 984 Hospital, Beijing, China
Lin-Feng Yan Department of Radiology & Functional and Molecular Imaging Key Lab of Shaanxi Province, Tangdu Hospital, Air Force Medical University, 569 Xinsi Road, Xi'an, 710038, Shaanxi, People's Republic of China
Guang-Bin Cui Department of Radiology & Functional and Molecular Imaging Key Lab of Shaanxi Province, Tangdu Hospital, Air Force Medical University, 569 Xinsi Road, Xi'an, 710038, Shaanxi, People's Republic of China
Wen Wang Department of Radiology & Functional and Molecular Imaging Key Lab of Shaanxi Province, Tangdu Hospital, Air Force Medical University, 569 Xinsi Road, Xi'an, 710038, Shaanxi, People's Republic of China.

Collapse

Fang CH, Theera-Ampornpunt N, Roth MA, Grama A, Chaterji S. AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU. BMC Bioinformatics 2019;20:488. [PMID: 31590652 PMCID: PMC6781298 DOI: 10.1186/s12859-019-3049-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 08/22/2019] [Indexed: 12/02/2022] Open

Abstract

Background

The data deluge can leverage sophisticated ML techniques for functionally annotating the regulatory non-coding genome. The challenge lies in selecting the appropriate classifier for the specific functional annotation problem, within the bounds of the hardware constraints and the model’s complexity. In our system Aikyatan, we annotate distal epigenomic regulatory sites, e.g., enhancers. Specifically, we develop a binary classifier that classifies genome sequences as distal regulatory regions or not, given their histone modifications’ combinatorial signatures. This problem is challenging because the regulatory regions are distal to the genes, with diverse signatures across classes (e.g., enhancers and insulators) and even within each class (e.g., different enhancer sub-classes).

Results

We develop a suite of ML models, under the banner Aikyatan, including SVM models, random forest variants, and deep learning architectures, for distal regulatory element (DRE) detection. We demonstrate, with strong empirical evidence, deep learning approaches have a computational advantage. Plus, convolutional neural networks (CNN) provide the best-in-class accuracy, superior to the vanilla variant. With the human embryonic cell line H1, CNN achieves an accuracy of 97.9% and an order of magnitude lower runtime than the kernel SVM. Running on a GPU, the training time is sped up 21x and 30x (over CPU) for DNN and CNN, respectively. Finally, our CNN model enjoys superior prediction performance vis-‘a-vis the competition. Specifically, Aikyatan-CNN achieved 40% higher validation rate versus CSIANN and the same accuracy as RFECS.

Conclusions

Our exhaustive experiments using an array of ML tools validate the need for a model that is not only expressive but can scale with increasing data volumes and diversity. In addition, a subset of these datasets have image-like properties and benefit from spatial pooling of features. Our Aikyatan suite leverages diverse epigenomic datasets that can then be modeled using CNNs with optimized activation and pooling functions. The goal is to capture the salient features of the integrated epigenomic datasets for deciphering the distal (non-coding) regulatory elements, which have been found to be associated with functional variants. Our source code will be made publicly available at: https://bitbucket.org/cellsandmachines/aikyatan.

Electronic supplementary material

The online version of this article (10.1186/s12859-019-3049-1) contains supplementary material, which is available to authorized users.

Collapse

Wang P, Hu J. A hybrid model for EEG-based gender recognition. Cogn Neurodyn 2019;13:541-554. [PMID: 31741691 PMCID: PMC6825103 DOI: 10.1007/s11571-019-09543-y] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2018] [Revised: 06/01/2019] [Accepted: 06/10/2019] [Indexed: 11/29/2022] Open

Eneanya OA, Cano J, Dorigatti I, Anagbogu I, Okoronkwo C, Garske T, Donnelly CA. Environmental suitability for lymphatic filariasis in Nigeria. Parasit Vectors 2018;11:513. [PMID: 30223860 PMCID: PMC6142334 DOI: 10.1186/s13071-018-3097-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 09/04/2018] [Indexed: 12/02/2022] Open

Zhang HH, Yang L, Liu Y, Wang P, Yin J, Li Y, Qiu M, Zhu X, Yan F. Classification of Parkinson's disease utilizing multi-edit nearest-neighbor and ensemble learning algorithms with speech samples. Biomed Eng Online 2016;15:122. [PMID: 27852279 PMCID: PMC5112697 DOI: 10.1186/s12938-016-0242-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2016] [Accepted: 11/07/2016] [Indexed: 11/10/2022] Open

Abstract

Background

The use of speech based data in the classification of Parkinson disease (PD) has been shown to provide an effect, non-invasive mode of classification in recent years. Thus, there has been an increased interest in speech pattern analysis methods applicable to Parkinsonism for building predictive tele-diagnosis and tele-monitoring models. One of the obstacles in optimizing classifications is to reduce noise within the collected speech samples, thus ensuring better classification accuracy and stability. While the currently used methods are effect, the ability to invoke instance selection has been seldomly examined.

Methods

In this study, a PD classification algorithm was proposed and examined that combines a multi-edit-nearest-neighbor (MENN) algorithm and an ensemble learning algorithm. First, the MENN algorithm is applied for selecting optimal training speech samples iteratively, thereby obtaining samples with high separability. Next, an ensemble learning algorithm, random forest (RF) or decorrelated neural network ensembles (DNNE), is used to generate trained samples from the collected training samples. Lastly, the trained ensemble learning algorithms are applied to the test samples for PD classification. This proposed method was examined using a more recently deposited public datasets and compared against other currently used algorithms for validation.

Results

Experimental results showed that the proposed algorithm obtained the highest degree of improved classification accuracy (29.44%) compared with the other algorithm that was examined. Furthermore, the MENN algorithm alone was found to improve classification accuracy by as much as 45.72%. Moreover, the proposed algorithm was found to exhibit a higher stability, particularly when combining the MENN and RF algorithms.

Conclusions

This study showed that the proposed method could improve PD classification when using speech data and can be applied to future studies seeking to improve PD classification methods.

Collapse

Biscarini F, Nazzicari N, Broccanello C, Stevanato P, Marini S. "Noisy beets": impact of phenotyping errors on genomic predictions for binary traits in Beta vulgaris. Plant Methods 2016;12:36. [PMID: 27437026 PMCID: PMC4949885 DOI: 10.1186/s13007-016-0136-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Accepted: 07/06/2016] [Indexed: 06/06/2023]

Roy PK, Bhuiyan A, Janke A, Desmond PM, Wong TY, Abhayaratna WP, Storey E, Ramamohanarao K. Automatic white matter lesion segmentation using contrast enhanced FLAIR intensity and Markov Random Field. Comput Med Imaging Graph 2015;45:102-11. [PMID: 26398564 DOI: 10.1016/j.compmedimag.2015.08.005] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2014] [Revised: 08/08/2015] [Accepted: 08/18/2015] [Indexed: 11/24/2022]

Cao DS, Zhang LX, Tan GS, Xiang Z, Zeng WB, Xu QS, Chen AF. Computational Prediction of DrugTarget Interactions Using Chemical, Biological, and Network Features. Mol Inform 2014;33:669-81. [PMID: 27485302 DOI: 10.1002/minf.201400009] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2014] [Accepted: 04/22/2014] [Indexed: 02/02/2023]