Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Bermingham ML, Pong-Wong R, Spiliopoulou A, Hayward C, Rudan I, Campbell H, Wright AF, Wilson JF, Agakov F, Navarro P, Haley CS. Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci Rep 2015;5:10312. [PMID: 25988841 PMCID: PMC4437376 DOI: 10.1038/srep10312] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2014] [Accepted: 04/08/2015] [Indexed: 01/20/2023] Open

For:	Bermingham ML, Pong-Wong R, Spiliopoulou A, Hayward C, Rudan I, Campbell H, Wright AF, Wilson JF, Agakov F, Navarro P, Haley CS. Application of high-dimensional feature selection: evaluation for genomic prediction in man. Sci Rep 2015;5:10312. [PMID: 25988841 PMCID: PMC4437376 DOI: 10.1038/srep10312] [Citation(s) in RCA: 126] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2014] [Accepted: 04/08/2015] [Indexed: 01/20/2023] Open

Number

Cited by Other Article(s)

Al‐Mamun HA, Danilevicz MF, Marsh JI, Gondro C, Edwards D. Exploring genomic feature selection: A comparative analysis of GWAS and machine learning algorithms in a large-scale soybean dataset. THE PLANT GENOME 2025;18:e20503. [PMID: 39253773 PMCID: PMC11726426 DOI: 10.1002/tpg2.20503] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 07/15/2024] [Accepted: 07/15/2024] [Indexed: 09/11/2024]

Marchitelli S, Mazza C, Ricci E, Faia V, Biondi S, Colasanti M, Cardinale A, Roma P, Tambelli R. Identification of Psychological Treatment Dropout Predictors Using Machine Learning Models on Italian Patients Living with Overweight and Obesity Ineligible for Bariatric Surgery. Nutrients 2024;16:2605. [PMID: 39203742 PMCID: PMC11357013 DOI: 10.3390/nu16162605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 07/30/2024] [Accepted: 08/02/2024] [Indexed: 09/03/2024] Open

Montesinos-López OA, Crespo-Herrera L, Pierre CS, Cano-Paez B, Huerta-Prado GI, Mosqueda-González BA, Ramos-Pulido S, Gerard G, Alnowibet K, Fritsche-Neto R, Montesinos-López A, Crossa J. Feature engineering of environmental covariates improves plant genomic-enabled prediction. FRONTIERS IN PLANT SCIENCE 2024;15:1349569. [PMID: 38812738 PMCID: PMC11135473 DOI: 10.3389/fpls.2024.1349569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 04/11/2024] [Indexed: 05/31/2024]

Shi H, Zhang Y, Yu Z, Yang Y. Reservoir temperature prediction based on characterization of water chemistry data-case study of western Anatolia, Turkey. Sci Rep 2024;14:10339. [PMID: 38710719 DOI: 10.1038/s41598-024-59409-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 04/10/2024] [Indexed: 05/08/2024] Open

Alireza Z, Maleeha M, Kaikkonen M, Fortino V. Enhancing prediction accuracy of coronary artery disease through machine learning-driven genomic variant selection. J Transl Med 2024;22:356. [PMID: 38627847 PMCID: PMC11020205 DOI: 10.1186/s12967-024-05090-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 03/14/2024] [Indexed: 04/19/2024] Open

Abstract

Machine learning (ML) methods are increasingly becoming crucial in genome-wide association studies for identifying key genetic variants or SNPs that statistical methods might overlook. Statistical methods predominantly identify SNPs with notable effect sizes by conducting association tests on individual genetic variants, one at a time, to determine their relationship with the target phenotype. These genetic variants are then used to create polygenic risk scores (PRSs), estimating an individual's genetic risk for complex diseases like cancer or cardiovascular disorders. Unlike traditional methods, ML algorithms can identify groups of low-risk genetic variants that improve prediction accuracy when combined in a mathematical model. However, the application of ML strategies requires addressing the feature selection challenge to prevent overfitting. Moreover, ensuring the ML model depends on a concise set of genomic variants enhances its clinical applicability, where testing is feasible for only a limited number of SNPs. In this study, we introduce a robust pipeline that applies ML algorithms in combination with feature selection (ML-FS algorithms), aimed at identifying the most significant genomic variants associated with the coronary artery disease (CAD) phenotype. The proposed computational approach was tested on individuals from the UK Biobank, differentiating between CAD and non-CAD individuals within this extensive cohort, and benchmarked against standard PRS-based methodologies like LDpred2 and Lassosum. Our strategy incorporates cross-validation to ensure a more robust evaluation of genomic variant-based prediction models. This method is commonly applied in machine learning strategies but has often been neglected in previous studies assessing the predictive performance of polygenic risk scores. Our results demonstrate that the ML-FS algorithm can identify panels with as few as 50 genetic markers that can achieve approximately 80% accuracy when used in combination with known risk factors. The modest increase in accuracy over PRS performances is noteworthy, especially considering that PRS models incorporate a substantially larger number of genetic variants. This extensive variant selection can pose practical challenges in clinical settings. Additionally, the proposed approach revealed novel CAD-genetic variant associations.

Collapse

Asnicar F, Thomas AM, Passerini A, Waldron L, Segata N. Machine learning for microbiologists. Nat Rev Microbiol 2024;22:191-205. [PMID: 37968359 PMCID: PMC11980903 DOI: 10.1038/s41579-023-00984-1] [Citation(s) in RCA: 44] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/03/2023] [Indexed: 11/17/2023]

Alemu A, Åstrand J, Montesinos-López OA, Isidro Y Sánchez J, Fernández-Gónzalez J, Tadesse W, Vetukuri RR, Carlsson AS, Ceplitis A, Crossa J, Ortiz R, Chawade A. Genomic selection in plant breeding: Key factors shaping two decades of progress. MOLECULAR PLANT 2024;17:552-578. [PMID: 38475993 DOI: 10.1016/j.molp.2024.03.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/22/2024] [Accepted: 03/08/2024] [Indexed: 03/14/2024]

Ćeran M, Đorđević V, Miladinović J, Vasiljević M, Đukić V, Ranđelović P, Jaćimović S. Selective Genotyping and Phenotyping for Optimization of Genomic Prediction Models for Populations with Different Diversity. PLANTS (BASEL, SWITZERLAND) 2024;13:975. [PMID: 38611503 PMCID: PMC11013471 DOI: 10.3390/plants13070975] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 03/22/2024] [Accepted: 03/24/2024] [Indexed: 04/14/2024]

Abstract

To overcome the different challenges to food security caused by a growing population and climate change, soybean (Glycine max (L.) Merr.) breeders are creating novel cultivars that have the potential to improve productivity while maintaining environmental sustainability. Genomic selection (GS) is an advanced approach that may accelerate the rate of genetic gain in breeding using genome-wide molecular markers. The accuracy of genomic selection can be affected by trait architecture and heritability, marker density, linkage disequilibrium, statistical models, and training set. The selection of a minimal and optimal marker set with high prediction accuracy can lower genotyping costs, computational time, and multicollinearity. Selective phenotyping could reduce the number of genotypes tested in the field while preserving the genetic diversity of the initial population. This study aimed to evaluate different methods of selective genotyping and phenotyping on the accuracy of genomic prediction for soybean yield. The evaluation was performed on three populations: recombinant inbred lines, multifamily diverse lines, and germplasm collection. Strategies adopted for marker selection were as follows: SNP (single nucleotide polymorphism) pruning, estimation of marker effects, randomly selected markers, and genome-wide association study. Reduction of the number of genotypes was performed by selecting a core set from the initial population based on marker data, yet maintaining the original population's genetic diversity. Prediction ability using all markers and genotypes was different among examined populations. The subsets obtained by the model-based strategy can be considered the most suitable for marker selection for all populations. The selective phenotyping based on makers in all cases had higher values of prediction ability compared to minimal values of prediction ability of multiple cycles of random selection, with the highest values of prediction obtained using AN approach and 75% population size. The obtained results indicate that selective genotyping and phenotyping hold great potential and can be integrated as tools for improving or retaining selection accuracy by reducing genotyping or phenotyping costs for genomic selection.

Collapse

Tutsoy O, Koç GG. Deep self-supervised machine learning algorithms with a novel feature elimination and selection approaches for blood test-based multi-dimensional health risks classification. BMC Bioinformatics 2024;25:103. [PMID: 38459463 PMCID: PMC10921629 DOI: 10.1186/s12859-024-05729-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 03/04/2024] [Indexed: 03/10/2024] Open

Sanchez-Trigo H, Molina-Martínez E, Grimaldi-Puyana M, Sañudo B. Effects of lifestyle behaviours and depressed mood on sleep quality in young adults. A machine learning approach. Psychol Health 2024;39:128-143. [PMID: 35475409 DOI: 10.1080/08870446.2022.2067331] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Accepted: 04/05/2022] [Indexed: 10/18/2022]

Ly QV, Tong NA, Lee BM, Nguyen MH, Trung HT, Le Nguyen P, Hoang THT, Hwang Y, Hur J. Improving algal bloom detection using spectroscopic analysis and machine learning: A case study in a large artificial reservoir, South Korea. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023;901:166467. [PMID: 37611716 DOI: 10.1016/j.scitotenv.2023.166467] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 08/17/2023] [Accepted: 08/19/2023] [Indexed: 08/25/2023]

Abstract

The prediction of algal blooms using traditional water quality indicators is expensive, labor-intensive, and time-consuming, making it challenging to meet the critical requirement of timely monitoring for prompt management. Using optical measures for forecasting algal blooms is a feasible and useful method to overcome these problems. This study explores the potential application of optical measures to enhance algal bloom prediction in terms of prediction accuracy and workload reduction, aided by machine learning (ML) models. Compared to absorption-derived parameters, commonly used fluorescence indices such as the fluorescence index (FI), humification index (HIX), biological index (BIX), and protein-like component improved the prediction accuracy. However, the prediction accuracy was decreased when all optical indices were considered for computation due to increased noise and uncertainty in the models. With the exception of chemical oxygen demand (COD), this study successfully replaced biochemical oxygen demand (BOD), dissolved organic carbon (DOC), and nutrients with selected fluorescence indices, demonstrating relatively analogous performance in either training or testing data, with consistent and good coefficient of determination (R2) values of approximately 0.85 and 0.74, respectively. Among all models considered, ensemble learning models consistently outperformed conventional regression models and artificial neural networks (ANNs). However, there was a trade-off between accuracy and computation efficiency among the ensemble learning models (i.e., Stacking and XGBoost) for algal bloom prediction. Our study offers a glimpse of the potential application of spectroscopic measures to improve accuracy and efficiency in algal bloom prediction, but further work should be carried out in other water bodies to further validate our proposed hypothesis.

Collapse

Heinrich F, Lange TM, Kircher M, Ramzan F, Schmitt AO, Gültas M. Exploring the potential of incremental feature selection to improve genomic prediction accuracy. Genet Sel Evol 2023;55:78. [PMID: 37946104 PMCID: PMC10634161 DOI: 10.1186/s12711-023-00853-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Accepted: 11/02/2023] [Indexed: 11/12/2023] Open

Abstract

BACKGROUND

The ever-increasing availability of high-density genomic markers in the form of single nucleotide polymorphisms (SNPs) enables genomic prediction, i.e. the inference of phenotypes based solely on genomic data, in the field of animal and plant breeding, where it has become an important tool. However, given the limited number of individuals, the abundance of variables (SNPs) can reduce the accuracy of prediction models due to overfitting or irrelevant SNPs. Feature selection can help to reduce the number of irrelevant SNPs and increase the model performance. In this study, we investigated an incremental feature selection approach based on ranking the SNPs according to the results of a genome-wide association study that we combined with random forest as a prediction model, and we applied it on several animal and plant datasets.

RESULTS

Applying our approach to different datasets yielded a wide range of outcomes, i.e. from a substantial increase in prediction accuracy in a few cases to minor improvements when only a fraction of the available SNPs were used. Compared with models using all available SNPs, our approach was able to achieve comparable performances with a considerably reduced number of SNPs in several cases. Our approach showcased state-of-the-art efficiency and performance while having a faster computation time.

CONCLUSIONS

The results of our study suggest that our incremental feature selection approach has the potential to improve prediction accuracy substantially. However, this gain seems to depend on the genomic data used. Even for datasets where the number of markers is smaller than the number of individuals, feature selection may still increase the performance of the genomic prediction. Our approach is implemented in R and is available at https://github.com/FelixHeinrich/GP_with_IFS/ .

Collapse

Zhang Y, Zhang M, Ye J, Xu Q, Feng Y, Xu S, Hu D, Wei X, Hu P, Yang Y. Integrating genome-wide association study into genomic selection for the prediction of agronomic traits in rice (Oryza sativa L.). MOLECULAR BREEDING : NEW STRATEGIES IN PLANT IMPROVEMENT 2023;43:81. [PMID: 37965378 PMCID: PMC10641074 DOI: 10.1007/s11032-023-01423-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 10/09/2023] [Indexed: 11/16/2023]

Affiliation(s)

Yuanyuan Zhang Zhejiang Lab, Hangzhou, 311121 China CNRRI-Zhejiang Lab Computational Breeding Joint Laboratory, China National Rice Research Institute, Hangzhou, China
Mengchen Zhang Zhejiang Lab, Hangzhou, 311121 China CNRRI-Zhejiang Lab Computational Breeding Joint Laboratory, China National Rice Research Institute, Hangzhou, China National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, 572024 China
Junhua Ye CNRRI-Zhejiang Lab Computational Breeding Joint Laboratory, China National Rice Research Institute, Hangzhou, China
Qun Xu CNRRI-Zhejiang Lab Computational Breeding Joint Laboratory, China National Rice Research Institute, Hangzhou, China
Yue Feng CNRRI-Zhejiang Lab Computational Breeding Joint Laboratory, China National Rice Research Institute, Hangzhou, China National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, 572024 China
Siliang Xu CNRRI-Zhejiang Lab Computational Breeding Joint Laboratory, China National Rice Research Institute, Hangzhou, China
Dongxiu Hu CNRRI-Zhejiang Lab Computational Breeding Joint Laboratory, China National Rice Research Institute, Hangzhou, China
Xinghua Wei Zhejiang Lab, Hangzhou, 311121 China CNRRI-Zhejiang Lab Computational Breeding Joint Laboratory, China National Rice Research Institute, Hangzhou, China National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, 572024 China
Peisong Hu Zhejiang Lab, Hangzhou, 311121 China CNRRI-Zhejiang Lab Computational Breeding Joint Laboratory, China National Rice Research Institute, Hangzhou, China
Yaolong Yang Zhejiang Lab, Hangzhou, 311121 China CNRRI-Zhejiang Lab Computational Breeding Joint Laboratory, China National Rice Research Institute, Hangzhou, China National Nanfan Research Institute (Sanya), Chinese Academy of Agricultural Sciences, Sanya, 572024 China

Collapse

Gebski V, Silva SSM, Byth K, Jenkins A, Keech A. Improving efficiency of fitting Cox proportional hazards models for time-to-event outcomes in genome-wide association studies (GWAS). BIOINFORMATICS ADVANCES 2023;3:vbad148. [PMID: 37928342 PMCID: PMC10625458 DOI: 10.1093/bioadv/vbad148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 10/02/2023] [Accepted: 10/11/2023] [Indexed: 11/07/2023]

Abstract

Summary

Technologies identifying single nucleotide polymorphisms (SNPs) in DNA sequencing yield an avalanche of data requiring analysis and interpretation. Standard methods may require many weeks of processing time. The use of statistical methods requiring data sorting, matrix inversions of a high-dimension and replication in subsets of the data on multiple outcomes exacerbate these times.A method which reduces the computational time in problems with time-to-event outcomes and hundreds of thousands/millions of SNPs using Cox-Snell residuals after fitting the Cox proportional hazards model (PH) to a fixed set of concomitant variables is proposed. This yields coefficients for SNP effect from a Cox-Snell adjusted Poisson model and shows a high concordance to the adjusted PH model.The method is illustrated with a sample of 10 000 SNPs from a genome-wide association study in a diabetic population. The gain in processing efficiency using the proposed method based on Poisson modelling can be as high as 62%. This could result in saving of over three weeks processing time if 5 million SNPs require analysis. The method involves only a single predictor variable (SNP), offering a simpler, computationally more stable approach to examining and identifying SNP patterns associated with the outcome(s) allowing for a faster development of genetic signatures. Use of deviance residuals from the PH model to screen SNPs demonstrates a large discordance rate at a 0.2% threshold of concordance. This rate is 15 times larger than that based on the Cox-Snell residuals from the Cox-Snell adjusted Poisson model.

Availability and implementation

The method is simple to implement as the procedures are available in most statistical packges. The approach involves obtaining Cox-Snell residuals from a PH model, to a binary time-to-event outcome, for factors which need to be common when assessing each SNP. Each SNP is then fitted as a predictor to the outcome of interest using a Poisson model with the Cox-Snell as the exposure variable.

Collapse

Lee YS, Oh JD, Lee JY, Shin D. A genomic estimated breeding value-assisted reduction method of single nucleotide polymorphism sets: a novel approach for determining the cutoff thresholds in genome-wide association studies and best linear unbiased prediction. Anim Cells Syst (Seoul) 2023;27:180-186. [PMID: 37674816 PMCID: PMC10478620 DOI: 10.1080/19768354.2023.2250841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 06/16/2023] [Accepted: 07/20/2023] [Indexed: 09/08/2023] Open

Mohseni N, Ghaniee Zarich M, Afshar S, Hosseini M. Identification of Novel Biomarkers for Response to Preoperative Chemoradiation in Locally Advanced Rectal Cancer with Genetic Algorithm-Based Gene Selection. J Gastrointest Cancer 2023;54:937-950. [PMID: 36534304 DOI: 10.1007/s12029-022-00873-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/05/2022] [Indexed: 12/23/2022]

Abstract

BACKGROUND

The conventional treatment for patients with locally advanced colorectal tumors is preoperative chemo-radiotherapy (PCRT) preceding surgery. This treatment strategy has some long-term side effects, and some patients do not respond to it. Therefore, an evaluation of biomarkers that may help predict patients' response to PCRT is essential.

METHODS

We took advantage of genetic algorithm to search the space of possible combinations of features to choose subsets of genes that would yield convenient performance in differentiating PCRT responders from non-responders using a logistic regression model as our classifier.

RESULTS

We developed two gene signatures; first, to achieve the maximum prediction accuracy, the algorithm yielded 39 genes, and then, aiming to reduce the feature numbers as much as possible (while maintaining acceptable performance), a 5-gene signature was chosen. The performance of the two gene signatures was (accuracy = 0.97 and 0.81, sensitivity = 0.96 and 0.83, and specificity = 86 and 0.77) using a logistic regression classifier. Through analyzing bias and variance decomposition of the model error, we further investigated the involved genes by discovering and validating another 28-gene signature which possibly points towards two different sub-systems involved in the response of the patients to treatment.

CONCLUSIONS

Using genetic algorithm as our gene selection method, we have identified two groups of genes that can differentiate PCRT responders from non-responders in patients of the studied dataset with considerable performance.

IMPACT

After passing standard requirements, our gene signatures may be applicable as a robust and effective PCRT response prediction tool for colorectal cancer patients in clinical settings and may also help future studies aiming to further investigate involved pathways gain a clearer picture for the course of their research.

Collapse

Mahmoudi A, Butler AE, Banach M, Jamialahmadi T, Sahebkar A. Identification of Potent Small-Molecule PCSK9 Inhibitors Based on Quantitative Structure-Activity Relationship, Pharmacophore Modeling, and Molecular Docking Procedure. Curr Probl Cardiol 2023;48:101660. [PMID: 36841313 DOI: 10.1016/j.cpcardiol.2023.101660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Accepted: 02/17/2023] [Indexed: 02/27/2023]

Abstract

The leading cause of atherosclerotic cardiovascular disease (ASCVD) is elevated low-density lipoprotein cholesterol (LDL-C). Proprotein convertase subtilisin/kexin type 9 (PCSK9) attaches to the domain of LDL receptor (LDLR), diminishing LDL-C influx and LDLR cell surface presentation in hepatocytes, resulting in higher circulating LDL-C levels. PCSK9 dysfunction has been linked to lower levels of plasma LDLC and a decreased risk of coronary heart disease (CHD). Herein, using virtual screening tools, we aimed to identify a potent small-molecule PCSK9 inhibitor in compounds that are currently being studied in clinical trials. We first performed chemical absorption, distribution, metabolism, excretion, and toxicity (ADMET) filtering of 9800 clinical trial compounds obtained from the ZINC 15 database using Lipinski's rule of 5 and achieved 3853 compounds. Two-dimensional (2D) quantitative structure-activity relationship (QSAR) was initiated by computing molecular descriptors and selecting important descriptors of 23 PCSK9 inhibitors. Multivariate calibration was performed with the partial least square regression (PLS) method with 18 compounds for training to design the QSAR model and 5 compounds for the test set to assess the model. The best latent variables (LV) (LV=6) with the lowest value of Root-Mean-Square Error of Cross-Validation (RMSECV) of 0.48 and leave-one-out cross-validation correlation coefficient (R2CV) = 0.83 were obtained for the QSAR model. The low RMSEC (0.21) with high R²cal (0.966) indicates the probability of fit between the experimental data and the calibration model. Using QSAR analysis of 3853 compounds, 2635 had a pIC50<1 and were considered for pharmacophore screening. The PHASE module (a complete package for pharmacophore modeling) designed the pharmacophore hypothesis through multiple ligands. The top 14 compounds (pIC50>1) were defined as active, whereas 9 (pIC50<1) were considered as an inactive set. Three five-point pharmacophore hypotheses achieved the highest score: DHHRR1, DHHRR2, and DHRRR1. The highest and best model with survival scores (5.365) was DHHRR1, comprising 1 hydrogen donor (D), 2 hydrophobic groups (H), and 2 rings of aromatic (R) features. We selected the molecules with a higher 1.5 fitness score (257 compounds) in pharmacophore screening (DHHRR1) for molecular docking screening. Molecular docking indicates that ZINC000051951669, with a binding affinity: of -13.2 kcal/mol and 2 H-bonds, has the highest binding to the PCSK9 protein. ZINC000011726230 with energy binding: -11.4 kcal/mol and 3 H-bonds, ZINC000068248147 with binding affinity: -10.7 kcal/mol and 1 H-bond, ZINC000029134440 with a binding affinity: -10.6 kcal/mol and 4 H-bonds were ranked next, respectively. To conclude, the archived molecules identified as inhibitory PCSK9 candidates, and especially ZINC000051951669 may therefore significantly inhibit PCSK9 and should be considered in the newly designed trials.

Collapse

Mowlaei ME, Shi X. FSF-GA: A Feature Selection Framework for Phenotype Prediction Using Genetic Algorithms. Genes (Basel) 2023;14:genes14051059. [PMID: 37239419 DOI: 10.3390/genes14051059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 05/04/2023] [Accepted: 05/06/2023] [Indexed: 05/28/2023] Open

Kisiel A, Krzemińska A, Cembrowska-Lech D, Miller T. Data Science and Plant Metabolomics. Metabolites 2023;13:metabo13030454. [PMID: 36984894 PMCID: PMC10054611 DOI: 10.3390/metabo13030454] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/16/2023] [Accepted: 03/17/2023] [Indexed: 03/30/2023] Open

Marjit S, Bhattacharyya T, Chatterjee B, Sarkar R. Simulated annealing aided genetic algorithm for gene selection from microarray data. Comput Biol Med 2023;158:106854. [PMID: 37023541 DOI: 10.1016/j.compbiomed.2023.106854] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 02/26/2023] [Accepted: 03/30/2023] [Indexed: 04/03/2023]

Farooq M, van Dijk AD, Nijveen H, Mansoor S, de Ridder D. Genomic prediction in plants: opportunities for ensemble machine learning based approaches. F1000Res 2023;11:802. [PMID: 37035464 PMCID: PMC10080209 DOI: 10.12688/f1000research.122437.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/04/2023] [Indexed: 01/12/2023] Open

Zhang S, Wang J, Li X, Liang Y. M6A-GSMS: Computational identification of N⁶-methyladenosine sites with GBDT and stacking learning in multiple species. J Biomol Struct Dyn 2022;40:12380-12391. [PMID: 34459713 DOI: 10.1080/07391102.2021.1970628] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Kim M, Bae J, Wang B, Ko H, Lim JS. Feature Selection Method Using Multi-Agent Reinforcement Learning Based on Guide Agents. SENSORS (BASEL, SWITZERLAND) 2022;23:98. [PMID: 36616694 PMCID: PMC9823489 DOI: 10.3390/s23010098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 12/12/2022] [Accepted: 12/16/2022] [Indexed: 06/17/2023]

Wilkinson MJ, Yamashita R, James ME, Bally ISE, Dillon NL, Ali A, Hardner CM, Ortiz-Barrientos D. The influence of genetic structure on phenotypic diversity in the Australian mango (Mangifera indica) gene pool. Sci Rep 2022;12:20614. [PMID: 36450793 PMCID: PMC9712640 DOI: 10.1038/s41598-022-24800-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 11/21/2022] [Indexed: 12/11/2022] Open

Do open citations give insights on the qualitative peer-review evaluation in research assessments? An analysis of the Italian National Scientific Qualification. Scientometrics 2022. [DOI: 10.1007/s11192-022-04581-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

Li Z, Li W, Yan W, Zhang R, Xie S. Data-driven learning to identify biomarkers in bipolar disorder. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022;226:107112. [PMID: 36156436 DOI: 10.1016/j.cmpb.2022.107112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 06/09/2022] [Accepted: 09/04/2022] [Indexed: 06/16/2023]

A divide-and-conquer approach for genomic prediction in rubber tree using machine learning. Sci Rep 2022;12:18023. [PMID: 36289298 PMCID: PMC9605989 DOI: 10.1038/s41598-022-20416-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 09/13/2022] [Indexed: 01/20/2023] Open

Dutta A, Hasan MK, Ahmad M, Awal MA, Islam MA, Masud M, Meshref H. Early Prediction of Diabetes Using an Ensemble of Machine Learning Models. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2022;19:ijerph191912378. [PMID: 36231678 PMCID: PMC9566114 DOI: 10.3390/ijerph191912378] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/20/2022] [Accepted: 09/24/2022] [Indexed: 05/15/2023]

Cho E, Cho S, Kim M, Ediriweera TK, Seo D, Lee SS, Cha J, Jin D, Kim YK, Lee JH. Single nucleotide polymorphism marker combinations for classifying Yeonsan Ogye chicken using a machine learning approach. JOURNAL OF ANIMAL SCIENCE AND TECHNOLOGY 2022;64:830-841. [PMID: 36287747 PMCID: PMC9574617 DOI: 10.5187/jast.2022.e64] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Revised: 07/15/2022] [Accepted: 08/01/2022] [Indexed: 11/27/2022]

Multichannel Acoustic Spectroscopy of the Human Body for Inviolable Biometric Authentication. BIOSENSORS 2022;12:bios12090700. [PMID: 36140085 PMCID: PMC9496529 DOI: 10.3390/bios12090700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/26/2022] [Accepted: 08/29/2022] [Indexed: 11/17/2022]

Shen C, Zhang K. Two-stage improved Grey Wolf optimization algorithm for feature selection on high-dimensional classification. COMPLEX INTELL SYST 2022. [DOI: 10.1007/s40747-021-00452-4 10.1007/s40747-021-00452-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Abstract AbstractIn recent years, evolutionary algorithms have shown great advantages in the field of feature selection because of their simplicity and potential global search capability. However, most of the existing feature selection algorithms based on evolutionary computation are wrapper methods, which are computationally expensive, especially for high-dimensional biomedical data. To significantly reduce the computational cost, it is essential to study an effective evaluation method. In this paper, a two-stage improved gray wolf optimization (IGWO) algorithm for feature selection on high-dimensional data is proposed. In the first stage, a multilayer perceptron (MLP) network with group lasso regularization terms is first trained to construct an integer optimization problem using the proposed algorithm for pre-selection of features and optimization of the hidden layer structure. The dataset is compressed using the feature subset obtained in the first stage. In the second stage, a multilayer perceptron network with group lasso regularization terms is retrained using the compressed dataset, and the proposed algorithm is employed to construct the discrete optimization problem for feature selection. Meanwhile, a rapid evaluation strategy is constructed to mitigate the evaluation cost and improve the evaluation efficiency in the feature selection process. The effectiveness of the algorithm was analyzed on ten gene expression datasets. The experimental results show that the proposed algorithm not only removes almost more than 95.7% of the features in all datasets, but also has better classification accuracy on the test set. In addition, the advantages of the proposed algorithm in terms of time consumption, classification accuracy and feature subset size become more and more prominent as the dimensionality of the feature selection problem increases. This indicates that the proposed algorithm is particularly suitable for solving high-dimensional feature selection problems. Collapse

Farooq M, van Dijk AD, Nijveen H, Mansoor S, de Ridder D. Genomic prediction in plants: opportunities for ensemble machine learning based approaches. F1000Res 2022;11:802. [PMID: 37035464 PMCID: PMC10080209 DOI: 10.12688/f1000research.122437.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/08/2022] [Indexed: 12/15/2022] Open

Jha AK, Mithun S, Purandare NC, Kumar R, Rangarajan V, Wee L, Dekker A. Radiomics: a quantitative imaging biomarker in precision oncology. Nucl Med Commun 2022;43:483-493. [PMID: 35131965 DOI: 10.1097/mnm.0000000000001543] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Yang Z, Liu X, Li T, Wu D, Wang J, Zhao Y, Han H. A systematic literature review of methods and datasets for anomaly-based network intrusion detection. Comput Secur 2022. [DOI: 10.1016/j.cose.2022.102675] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

A2BCF: An Automated ABC-Based Feature Selection Algorithm for Classification Models in an Education Application. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12073553] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Sheh A, Artim SC, Burns MA, Molina-Mora JA, Lee MA, Dzink-Fox J, Muthupalani S, Fox JG. Alterations in common marmoset gut microbiome associated with duodenal strictures. Sci Rep 2022;12:5277. [PMID: 35347206 PMCID: PMC8960757 DOI: 10.1038/s41598-022-09268-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2021] [Accepted: 03/21/2022] [Indexed: 12/13/2022] Open

Abstract

Chronic gastrointestinal (GI) diseases are the most common diseases in captive common marmosets (Callithrix jacchus). Despite standardized housing, diet and husbandry, a recently described gastrointestinal syndrome characterized by duodenal ulcers and strictures was observed in a subset of marmosets sourced from the New England Primate Research Center. As changes in the gut microbiome have been associated with GI diseases, the gut microbiome of 52 healthy, non-stricture marmosets (153 samples) were compared to the gut microbiome of 21 captive marmosets diagnosed with a duodenal ulcer/stricture (57 samples). No significant changes were observed using alpha diversity metrics, and while the community structure was significantly different when comparing beta diversity between healthy and stricture cases, the results were inconclusive due to differences observed in the dispersion of both datasets. Differences in the abundance of individual taxa using ANCOM, as stricture-associated dysbiosis was characterized by Anaerobiospirillum loss and Clostridium perfringens increases. To identify microbial and serum biomarkers that could help classify stricture cases, we developed models using machine learning algorithms (random forest, classification and regression trees, support vector machines and k-nearest neighbors) to classify microbiome, serum chemistry or complete blood count (CBC) data. Random forest (RF) models were the most accurate models and correctly classified strictures using either 9 ASVs (amplicon sequence variants), 4 serum chemistry tests or 6 CBC tests. Based on the RF model and ANCOM results, C. perfringens was identified as a potential causative agent associated with the development of strictures. Clostridium perfringens was also isolated by microbiological culture in 4 of 9 duodenum samples from marmosets with histologically confirmed strictures. Due to the enrichment of C. perfringens in situ, we analyzed frozen duodenal tissues using both 16S microbiome profiling and RNAseq. Microbiome analysis of the duodenal tissues of 29 marmosets from the MIT colony confirmed an increased abundance of Clostridium in stricture cases. Comparison of the duodenal gene expression from stricture and non-stricture marmosets found enrichment of genes associated with intestinal absorption, and lipid metabolism, localization, and transport in stricture cases. Using machine learning, we identified increased abundance of C. perfringens, as a potential causative agent of GI disease and intestinal strictures in marmosets.

Collapse

Ferrucci R, Mameli F, Ruggiero F, Reitano M, Miccoli M, Gemignani A, Conversano C, Dini M, Zago S, Piacentini S, Poletti B, Priori A, Orrù G. Alternate fluency in Parkinson’s disease: A machine learning analysis. PLoS One 2022;17:e0265803. [PMID: 35320291 PMCID: PMC8942276 DOI: 10.1371/journal.pone.0265803] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Accepted: 03/08/2022] [Indexed: 11/18/2022] Open

Melek Manshouri N. Identifying COVID-19 by using spectral analysis of cough recordings: a distinctive classification study. Cogn Neurodyn 2022;16:239-253. [PMID: 34341676 PMCID: PMC8320312 DOI: 10.1007/s11571-021-09695-w] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Revised: 05/04/2021] [Accepted: 07/01/2021] [Indexed: 12/19/2022] Open

Abstract

Sound signals from the respiratory system are largely taken as tokens of human health. Early diagnosis of respiratory tract diseases is of great importance because, if delayed, it exerts irreversible effects on human health. The Coronavirus pandemic, which is deeply shaking the world, has revealed the importance of this diagnosis even more. During the pandemic, it has become the focus of researchers to differentiate symptoms from similar diseases such as influenza. Among these symptoms, the difference in cough sound played a distinctive role in research. Clinical data collected under the supervision of doctors in a reliable environment were used as the dataset consisting of 16 subjects suspected of COVID-19 with a specific patient demographic. Using the polymerase chain reaction test, the suspected subjects were divided into two groups as negative and positive. The negative and positive labels represent the patients with non-COVID and with a COVID-19 cough, respectively. Using the 3D plot or waterfall representation of the signal frequency spectrum, the salient features of the cough data are revealed. In this way, COVID-19 can be differentiated from other coughs by applying effective feature extraction and classification techniques. Power spectral density based on short-time Fourier transform and mel-frequency cepstral coefficients (MFCC) were chosen as the efficient feature extraction method. From among the classification techniques, the support vector machine (SVM) algorithm was applied to the processed signals in order to identify and classify COVID-19 cough. In terms of results evaluation, the cough of subjects with COVID-19 was detected with 95.86% classification accuracy thanks to the radial basis function (RBF) kernel function of SVM and the MFCC method. The diagnosis of COVID-19 coughs was performed with 98.6% and 91.7% sensitivity and specificity, respectively.

Collapse

Zhang Y, Ma Y, Yang X. Multi-label feature selection based on logistic regression and manifold learning. APPL INTELL 2022. [DOI: 10.1007/s10489-021-03008-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Xu HZ, Peng XR, Liu YR, Lei X, Yu J. Sleep Quality Modulates the Association between Dynamic Functional Network Connectivity and Cognitive Function in Healthy Older Adults. Neuroscience 2022;480:131-142. [PMID: 34785273 DOI: 10.1016/j.neuroscience.2021.11.018] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 11/01/2021] [Accepted: 11/08/2021] [Indexed: 12/12/2022]

Masoomi Sefiddashti F, Asadpour S, Haddadi H, Ghanavati Nasab S. QSAR analysis of pyrimidine derivatives as VEGFR-2 receptor inhibitors to inhibit cancer using multiple linear regression and artificial neural network. Res Pharm Sci 2021;16:596-611. [PMID: 34760008 PMCID: PMC8562410 DOI: 10.4103/1735-5362.327506] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 04/13/2021] [Accepted: 09/22/2021] [Indexed: 11/15/2022] Open

Sajjadian M, Lam RW, Milev R, Rotzinger S, Frey BN, Soares CN, Parikh SV, Foster JA, Turecki G, Müller DJ, Strother SC, Farzan F, Kennedy SH, Uher R. Machine learning in the prediction of depression treatment outcomes: a systematic review and meta-analysis. Psychol Med 2021;51:2742-2751. [PMID: 35575607 DOI: 10.1017/s0033291721003871] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Affiliation(s)

Mehri Sajjadian Department of Psychiatry, Dalhousie University, Halifax, NS, Canada
Raymond W Lam Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
Roumen Milev Department of Psychiatry and Psychology, Queen's University, Providence Care Hospital, Kingston, ON, Canada
Susan Rotzinger Department of Psychiatry, University of Toronto, Toronto, ON, Canada Department of Psychiatry, St. Michael's Hospital, University of Toronto, Toronto, Ontario, Canada
Benicio N Frey Department of Psychiatry and Behavioural Neurosciences, McMaster University, Hamilton, ON, Canada Mood Disorders Program and Women's Health Concerns Clinic, St. Joseph's Healthcare Hamilton, Hamilton, ON, Canada
Claudio N Soares Department of Psychiatry, Queen's University School of Medicine, Kingston, ON, Canada
Sagar V Parikh Department of Psychiatry, University of Michigan, Ann Arbor, MI, USA
Jane A Foster Department of Psychiatry & Behavioural Neurosciences, St. Joseph's Healthcare, Hamilton, ON, Canada
Gustavo Turecki Department of Psychiatry, Douglas Institute, McGill University, Montreal, QC, Canada
Daniel J Müller Campbell Family Mental Health Research Institute, Center for Addiction and Mental Health, Toronto, ON, Canada Department of Psychiatry, University of Toronto, Toronto, ON, Canada
Stephen C Strother Baycrest and Department of Medical Biophysics, Rotman Research Center, University of Toronto, Toronto, ON, Canada
Faranak Farzan eBrain Lab, School of Mechatronic Systems Engineering, Simon Fraser University, Surrey, BC, Canada
Sidney H Kennedy Department of Psychiatry, University of Toronto, Toronto, ON, Canada Department of Psychiatry, St. Michael's Hospital, University of Toronto, Toronto, Ontario, Canada Department of Psychiatry, University Health Network, Toronto, ON, Canada Krembil Research Centre, University Health Network, University of Toronto, Toronto, ON, Canada
Rudolf Uher Department of Psychiatry, Dalhousie University, Halifax, NS, Canada

Collapse

Ly QV, Nguyen XC, Lê NC, Truong TD, Hoang THT, Park TJ, Maqbool T, Pyo J, Cho KH, Lee KS, Hur J. Application of Machine Learning for eutrophication analysis and algal bloom prediction in an urban river: A 10-year study of the Han River, South Korea. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021;797:149040. [PMID: 34311376 DOI: 10.1016/j.scitotenv.2021.149040] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/29/2021] [Accepted: 07/10/2021] [Indexed: 06/13/2023]

Abstract

The increasing release of nutrients to aquatic environments has led to great concern regarding eutrophication and the risk of unwanted algal blooms. Based on observational data of 20 water quality parameters measured on a monthly basis at 40 stations from 2011 to 2020, this study applied different Machine Learning (ML) algorithms to suggest the best option for algal bloom prediction in the Han River, a large river in South Korea. Eight different ML algorithms were categorized into several groups of statistical learning, regression family, and deep learning, and were then compared for their suitability to predict the chlorophyll-derived trophic index (TSI-Chla). ML algorithms helped identify the most important water quality parameters contributing to algal bloom prediction. The ML results confirmed that eutrophication and algal proliferation were governed by the complex interplay between nutrients (nitrogen and phosphorus), organic contaminants, and environmental factors. Of the models tested, the adaptive neuro-fuzzy inference system (ANFIS) exhibited the best performance owing to its consistent and outperforming prediction both quantitatively (i.e., via regression) and qualitatively (i.e., via classification), which was evidenced by the lowest value of mean absolute error (MAE) of 0.09, and the highest F1-score, Recall and Precision of 0.97, 0.98 and 0.96, respectively. In a further step, a representative web application was constructed to assist common users to predict the trophic status of the Han River. This study demonstrated that ML techniques are not only promising for highly accurate water quality modeling of urban rivers, but also reduce time and labor intensity for experiments, which decreases the number of monitored water quality parameters, providing further insights into the driving factors of water quality deterioration. They ultimately help devise proactive strategies for sustainable water management.

Collapse

Differentiation of Cystic Fibrosis-Related Pathogens by Volatile Organic Compound Analysis with Secondary Electrospray Ionization Mass Spectrometry. Metabolites 2021;11:metabo11110773. [PMID: 34822431 PMCID: PMC8617967 DOI: 10.3390/metabo11110773] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/05/2021] [Accepted: 11/08/2021] [Indexed: 12/02/2022] Open

Dalvie S, Chatzinakos C, Al Zoubi O, Georgiadis F, PGC-PTSD Systems Biology workgroup, Lancashire L, Daskalakis NP. From genetics to systems biology of stress-related mental disorders. Neurobiol Stress 2021;15:100393. [PMID: 34584908 PMCID: PMC8456113 DOI: 10.1016/j.ynstr.2021.100393] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 07/22/2021] [Accepted: 09/08/2021] [Indexed: 01/20/2023] Open

Monaco A, Pantaleo E, Amoroso N, Lacalamita A, Lo Giudice C, Fonzino A, Fosso B, Picardi E, Tangaro S, Pesole G, Bellotti R. A primer on machine learning techniques for genomic applications. Comput Struct Biotechnol J 2021;19:4345-4359. [PMID: 34429852 PMCID: PMC8365460 DOI: 10.1016/j.csbj.2021.07.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 07/23/2021] [Accepted: 07/23/2021] [Indexed: 11/28/2022] Open

Affiliation(s)

Alfonso Monaco Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy
Ester Pantaleo Dipartimento Interateneo di Fisica "M. Merlin", Università degli Studi di Bari "Aldo Moro", Via G. Amendola 173, 70125 Bari, Italy
Nicola Amoroso Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento di Farmacia - Scienze del Farmaco, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
Antonio Lacalamita National Institute of Gastroenterology "S. de Bellis", Research Hospital, 70013 Castellana Grotte (Bari), Italy
Claudio Lo Giudice Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
Adriano Fonzino Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy
Bruno Fosso Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
Ernesto Picardi Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy.,Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
Sabina Tangaro Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento di Scienze del Suolo, della Pianta e degli Alimenti, Università degli Studi di Bari "Aldo Moro", Bari, Via G. Amendola 165, 70125 Bari, Italy
Graziano Pesole Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari "Aldo Moro", Via A. Orabona 4, 70125 Bari, Italy.,Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari, Consiglio Nazionale delle Ricerche, Via G. Amendola 122/O, 70126 Bari, Italy
Roberto Bellotti Istituto Nazionale di Fisica Nucleare (INFN), Sezione di Bari, Via A. Orabona 4, 70125 Bari, Italy.,Dipartimento Interateneo di Fisica "M. Merlin", Università degli Studi di Bari "Aldo Moro", Via G. Amendola 173, 70125 Bari, Italy

Collapse

Two-stage improved Grey Wolf optimization algorithm for feature selection on high-dimensional classification. COMPLEX INTELL SYST 2021. [DOI: 10.1007/s40747-021-00452-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

Morgenstern JD, Rosella LC, Costa AP, de Souza RJ, Anderson LN. Perspective: Big Data and Machine Learning Could Help Advance Nutritional Epidemiology. Adv Nutr 2021;12:621-631. [PMID: 33606879 PMCID: PMC8166570 DOI: 10.1093/advances/nmaa183] [Citation(s) in RCA: 31] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2020] [Revised: 11/04/2020] [Accepted: 12/29/2020] [Indexed: 01/09/2023] Open

Abstract

The field of nutritional epidemiology faces challenges posed by measurement error, diet as a complex exposure, and residual confounding. The objective of this perspective article is to highlight how developments in big data and machine learning can help address these challenges. New methods of collecting 24-h dietary recalls and recording diet could enable larger samples and more repeated measures to increase statistical power and measurement precision. In addition, use of machine learning to automatically classify pictures of food could become a useful complimentary method to help improve precision and validity of dietary measurements. Diet is complex due to thousands of different foods that are consumed in varying proportions, fluctuating quantities over time, and differing combinations. Current dietary pattern methods may not integrate sufficient dietary variation, and most traditional modeling approaches have limited incorporation of interactions and nonlinearity. Machine learning could help better model diet as a complex exposure with nonadditive and nonlinear associations. Last, novel big data sources could help avoid unmeasured confounding by offering more covariates, including both omics and features derived from unstructured data with machine learning methods. These opportunities notwithstanding, application of big data and machine learning must be approached cautiously to ensure quality of dietary measurements, avoid overfitting, and confirm accurate interpretations. Greater use of machine learning and big data would also require substantial investments in training, collaborations, and computing infrastructure. Overall, we propose that judicious application of big data and machine learning in nutrition science could offer new means of dietary measurement, more tools to model the complexity of diet and its relations with diseases, and additional potential ways of addressing confounding.

Collapse

Weight Feedback-Based Harmonic MDG-Ensemble Model for Prediction of Traffic Accident Severity. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11115072] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]

Reproducible Evaluation of Diffusion MRI Features for Automatic Classification of Patients with Alzheimer's Disease. Neuroinformatics 2021;19:57-78. [PMID: 32524428 DOI: 10.1007/s12021-020-09469-5] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]

Abstract

Diffusion MRI is the modality of choice to study alterations of white matter. In past years, various works have used diffusion MRI for automatic classification of Alzheimer's disease. However, classification performance obtained with different approaches is difficult to compare because of variations in components such as input data, participant selection, image preprocessing, feature extraction, feature rescaling (FR), feature selection (FS) and cross-validation (CV) procedures. Moreover, these studies are also difficult to reproduce because these different components are not readily available. In a previous work (Samper-González et al. 2018), we propose an open-source framework for the reproducible evaluation of AD classification from T1-weighted (T1w) MRI and PET data. In the present paper, we first extend this framework to diffusion MRI data. Specifically, we add: conversion of diffusion MRI ADNI data into the BIDS standard and pipelines for diffusion MRI preprocessing and feature extraction. We then apply the framework to compare different components. First, FS has a positive impact on classification results: highest balanced accuracy (BA) improved from 0.76 to 0.82 for task CN vs AD. Secondly, voxel-wise features generally gives better performance than regional features. Fractional anisotropy (FA) and mean diffusivity (MD) provided comparable results for voxel-wise features. Moreover, we observe that the poor performance obtained in tasks involving MCI were potentially caused by the small data samples, rather than by the data imbalance. Furthermore, no extensive classification difference exists for different degree of smoothing and registration methods. Besides, we demonstrate that using non-nested validation of FS leads to unreliable and over-optimistic results: 5% up to 40% relative increase in BA. Lastly, with proper FR and FS, the performance of diffusion MRI features is comparable to that of T1w MRI. All the code of the framework and the experiments are publicly available: general-purpose tools have been integrated into the Clinica software package ( www.clinica.run ) and the paper-specific code is available at: https://github.com/aramis-lab/AD-ML .

Collapse