1
|
Jahangiri M, Kazemnejad A, Goldfeld KS, Daneshpour MS, Momen M, Mostafaei S, Khalili D, Akbarzadeh M. Leveraging mixed-effects regression trees for the analysis of high-dimensional longitudinal data to identify the low and high-risk subgroups: simulation study with application to genetic study. BioData Min 2025; 18:22. [PMID: 40108712 PMCID: PMC11924713 DOI: 10.1186/s13040-025-00437-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Accepted: 03/03/2025] [Indexed: 03/22/2025] Open
Abstract
BACKGROUND The linear mixed-effects model (LME) is a conventional parametric method mainly used for analyzing longitudinal and clustered data in genetic studies. Previous studies have shown that this model can be sensitive to parametric assumptions and provides less predictive performance than non-parametric methods such as random effects-expectation maximization (RE-EM) and unbiased RE-EM regression tree algorithms. These longitudinal regression trees utilize classification and regression trees (CART) and conditional inference trees (Ctree) to estimate the fixed-effects components of the mixed-effects model. While CART is a well-known tree algorithm, it suffers from greediness. To mitigate this issue, we used the Evtree algorithm to estimate the fixed-effects part of the LME for handling longitudinal and clustered data in genome association studies. METHODS In this study, we propose a new non-parametric longitudinal-based algorithm called "Ev-RE-EM" for modeling a continuous response variable using the Evtree algorithm to estimate the fixed-effects part of the LME. We compared its predictive performance with other tree algorithms, such as RE-EM and unbiased RE-EM, with and without considering the structure for autocorrelation between errors within subjects to analyze the longitudinal data in the genetic study. The autocorrelation structures include a first-order autoregressive process, a compound symmetric structure with a constant correlation, and a general correlation matrix. The real data was obtained from the longitudinal Tehran cardiometabolic genetic study (TCGS). The data modeling used body mass index (BMI) as the phenotype and included predictor variables such as age, sex, and 25,640 single nucleotide polymorphisms (SNPs). RESULTS The results demonstrated that the predictive performance of Ev-RE-EM and unbiased RE-EM was nearly similar. Additionally, the Ev-RE-EM algorithm generated smaller trees than the unbiased RE-EM algorithm, enhancing tree interpretability. CONCLUSION The results showed that the unbiased RE-EM and Ev-RE-EM algorithms outperformed the RE-EM algorithm. Since algorithm performance varies across datasets, researchers should test different algorithms on the dataset of interest and select the best-performing one. Accurately predicting and diagnosing an individual's genetic profile is crucial in medical studies. The model with the highest accuracy should be used to enhance understanding of the genetics of complex traits, improve disease prevention and diagnosis, and aid in treating complex human diseases.
Collapse
Affiliation(s)
- Mina Jahangiri
- Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran
| | - Anoshirvan Kazemnejad
- Department of Biostatistics, Faculty of Medical Sciences, Tarbiat Modares University, Tehran, Iran.
| | - Keith S Goldfeld
- Division of Biostatistics, Department of Population Health, NYU Grossman School of Medicine, New York, NY, USA
| | - Maryam S Daneshpour
- Cellular and Molecular Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mehdi Momen
- Department of Surgical Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, Madison, WI, USA
| | - Shayan Mostafaei
- Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden
| | - Davood Khalili
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mahdi Akbarzadeh
- Cellular and Molecular Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.
| |
Collapse
|
2
|
Ali Z, Jamil Y, Anwar H, Sarfraz RA. Classification of e-waste using machine learning-assisted laser-induced breakdown spectroscopy. WASTE MANAGEMENT & RESEARCH : THE JOURNAL OF THE INTERNATIONAL SOLID WASTES AND PUBLIC CLEANSING ASSOCIATION, ISWA 2025; 43:408-420. [PMID: 38725243 DOI: 10.1177/0734242x241248730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/03/2025]
Abstract
Waste management and the economy are intertwined in various ways. Adopting sustainable waste management techniques can contribute to economic growth and resource conservation. Artificial intelligence (AI)-based classification is very crucial for rapid and contactless classification of metals in electronic waste (e-waste) management. In the present research work, five types of aluminium alloys, because of their extensive use in structural, electrical and thermotechnical functions in the electronics industry, were taken. Laser-induced breakdown spectroscopy (LIBS), a spectral identifier technique, was employed in conjunction with machine learning (ML) classification models of AI. Principal component analysis (PCA), an unsupervised ML classifier, was found incapable to differentiate LIBS data of alloys. Supervised ML classifier was then trained (for 10-fold cross-validation) on randomly selected 80% and tested on 20% spectral data of each alloy to assess classification capacity of each. In most of the tested variants of K nearest neighbour (kNN) the resulting accuracy was lower than 30% but kNN ensembled with random subspace method showed improved accuracy up to 98%. This study revealed that an AI-based LIBS system can classify e-waste alloys rather effectively in a non-contactless mode and could potentially be connected with robotic systems, hence, minimizing manual labour.
Collapse
Affiliation(s)
- Zahid Ali
- Laser Spectroscopy Lab, Department of Physics, University of Agriculture Faisalabad, Pakistan
- Department of Physics, University of Agriculture Faisalabad, Pakistan
| | - Yasir Jamil
- Laser Spectroscopy Lab, Department of Physics, University of Agriculture Faisalabad, Pakistan
- Department of Physics, University of Agriculture Faisalabad, Pakistan
| | - Hafeez Anwar
- Department of Physics, University of Agriculture Faisalabad, Pakistan
| | - Raja Adil Sarfraz
- Department of Chemistry, University of Agriculture, Faisalabad, Pakistan
| |
Collapse
|
3
|
Habibalahi A, Anwer AG, Knab A, Grey ST, Goldys EM, Campbell JM. Multispectral autofluorescence for label free classification of immune cell type and activation/polarization status. Scand J Immunol 2025; 101:e70004. [PMID: 39924799 PMCID: PMC11808199 DOI: 10.1111/sji.70004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 11/21/2024] [Accepted: 01/13/2025] [Indexed: 02/11/2025]
Abstract
Evaluating immune status is a challenging and time-consuming process that involves analysing various biomarkers through numerous assays. The sensitive label-free technique of multispectral imaging of cell autofluorescence involves directly assessing the molecular composition of cells to gather biological information. Cells were cultured in RPMI 1640 modified media supplemented with penicillin-streptomycin and 10% foetal bovine serum at 37°C, with 5% CO2 and 95% humidity. Activation and differentiation was confirmed using immunofluorophores against relevant markers. Multispectral microscopy utilized defined spectral regions, which spanned the excitation (345-476 nm) and emission (414-675 nm) wavelength ranges. In total, 56 distinct spectral channels were applied. These channels cover the spectrum of several fluorophores notably NAD(P)H and flavins, whose concentrations depend on cellular metabolism. We identified distinct spectral signatures for characterizing cells from the Jurkat, Ramos, THP-1, and HL-60 immune cell lines. These signatures correspond to four major immune cell types: T cells (Lymphocytes), B cells (Lymphocytes), monocytes and neutrophils. Moreover, our investigation explored the potential identification of both activated and resting forms of these cells, including the discrimination of M0, M1 and M2 polarized macrophages. Classification accuracy ranged from 92% to 100% based on receiver operator characteristic area under the curve (ROC AUC) assessment. These results indicate that the multispectral evaluation of cell autofluorescence is applicable for characterization of immune status. This includes the assessment of cell types and their activation status, all achievable through a single non-invasive assay.
Collapse
Affiliation(s)
- Abbas Habibalahi
- Graduate School of Biomedical Engineering, Faculty of EngineeringUniversity of New South WalesSydneyNew South WalesAustralia
- ARC Centre of Excellence for Nanoscale BiophotonicsUniversity of New South WalesSydneyNew South WalesAustralia
| | - Ayad G. Anwer
- Graduate School of Biomedical Engineering, Faculty of EngineeringUniversity of New South WalesSydneyNew South WalesAustralia
- ARC Centre of Excellence for Nanoscale BiophotonicsUniversity of New South WalesSydneyNew South WalesAustralia
| | - Aline Knab
- Graduate School of Biomedical Engineering, Faculty of EngineeringUniversity of New South WalesSydneyNew South WalesAustralia
- ARC Centre of Excellence for Nanoscale BiophotonicsUniversity of New South WalesSydneyNew South WalesAustralia
| | - Shane T. Grey
- Transplantation Immunology LaboratoryGarvan Institute of Medical ResearchDarlinghurstNew South WalesAustralia
- Translation Science PillarGarvan Institute of Medical ResearchDarlinghurstNew South WalesAustralia
- School of Biotechnology and Biomolecular Sciences, Faculty of ScienceUniversity of New South WalesSydneyNew South WalesAustralia
| | - Ewa M. Goldys
- Graduate School of Biomedical Engineering, Faculty of EngineeringUniversity of New South WalesSydneyNew South WalesAustralia
- ARC Centre of Excellence for Nanoscale BiophotonicsUniversity of New South WalesSydneyNew South WalesAustralia
| | - Jared M. Campbell
- Graduate School of Biomedical Engineering, Faculty of EngineeringUniversity of New South WalesSydneyNew South WalesAustralia
- ARC Centre of Excellence for Nanoscale BiophotonicsUniversity of New South WalesSydneyNew South WalesAustralia
| |
Collapse
|
4
|
Kumar RG, Selmanovic E, Gilmore N, Spielman L, Li LM, Hoffman JM, Bodien YG, Snider SB, Freeman HJ, de Souza NL, Donald CLM, Edlow BL, Dams-O’Connor K. Distinct clinical phenotypes and their neuroanatomic correlates in chronic traumatic brain injury. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.01.27.25321200. [PMID: 39974133 PMCID: PMC11838966 DOI: 10.1101/2025.01.27.25321200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Accumulating evidence of heterogeneous long-term outcomes after traumatic brain injury (TBI) has challenged longstanding approaches to TBI outcome classification that are largely based on global functioning. A lack of studies with clinical and biomarker data from individuals living with chronic (>1 year post-injury) TBI has precluded refinement of long-term outcome classification ontology. Multimodal data in well-characterized TBI cohorts is required to understand the clinical phenotypes and biological underpinnings of persistent symptoms in the chronic phase of TBI. The present cross-sectional study leveraged data from 281 participants with chronic complicated mild-to-severe TBI in the Late Effects of Traumatic Brain Injury (LETBI) Study. Our primary objective was to develop and validate clinical phenotypes using data from 41 TBI measures spanning a comprehensive cognitive battery, motor testing, and assessments of mood, health, and functioning. We performed a 70/30% split of training (n=195) and validation (n=86) datasets and performed principal components analysis to reduce the dimensionality of data. We used Hierarchical Cluster Analysis on Principal Components with k-means consolidation to identify clusters, or phenotypes, with shared clinical features. Our secondary objective was to investigate differences in brain volume in seven cortical networks across clinical phenotypes in the subset of 168 participants with brain MRI data. We performed multivariable linear regression models adjusted for age, age-squared, sex, scanner, injury chronicity, injury severity, and training/validation set. In the training/validation sets, we observed four phenotypes: 1) mixed cognitive and mood/behavioral deficits (11.8%; 15.1% in the training and validation set, respectively); 2) predominant cognitive deficits (20.5%; 23.3%); 3) predominant mood/behavioral deficits (27.7%; 22.1%); and 4) few deficits across domains (40%; 39.5%). The predominant cognitive deficit phenotype had lower cortical volumes in executive control, dorsal attention, limbic, default mode, and visual networks, relative to the phenotype with few deficits. The predominant mood/behavioral deficit phenotype had lower volumes in dorsal attention, limbic, and visual networks, compared to the phenotype with few deficits. Contrary to expectation, we did not detect differences in network-specific volumes between the phenotypes with mixed deficits versus few deficits. We identified four clinical phenotypes and their neuroanatomic correlates in a well-characterized cohort of individuals with chronic TBI. TBI phenotypes defined by symptom clusters, as opposed to global functioning, could inform clinical trial stratification and treatment selection. Individuals with predominant cognitive and mood/behavioral deficits had reduced cortical volumes in specific cortical networks, providing insights into sensitive, though not specific, candidate imaging biomarkers of clinical symptom phenotypes after chronic TBI and potential targets for intervention.
Collapse
Affiliation(s)
- Raj G. Kumar
- Department of Rehabilitation and Human Performance, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Enna Selmanovic
- Department of Rehabilitation and Human Performance, Icahn School of Medicine at Mount Sinai, New York, NY
- Nash Family Department of Neuroscience, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Natalie Gilmore
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Lisa Spielman
- Department of Rehabilitation and Human Performance, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Lucia M. Li
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
- Department of Brain Sciences, Imperial College London, W12 0BZ, UK
| | - Jeanne M. Hoffman
- Department of Rehabilitation Medicine, University of Washington School of Medicine, Seattle, WA
| | - Yelena G. Bodien
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
- Department of Physical Medicine and Rehabilitation, Spaulding Rehabilitation Hospital and Harvard Medical School, Charlestown MA
| | - Samuel B. Snider
- Department of Neurology, Brigham and Women’s Hospital and Harvard Medical School, Boston MA
| | - Holly J. Freeman
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital and Harvard Medical School, Charlestown, MA
| | - Nicola L. de Souza
- Department of Rehabilitation and Human Performance, Icahn School of Medicine at Mount Sinai, New York, NY
| | | | - Brian L Edlow
- Center for Neurotechnology and Neurorecovery, Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
- Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital and Harvard Medical School, Charlestown, MA
| | - Kristen Dams-O’Connor
- Department of Rehabilitation and Human Performance, Icahn School of Medicine at Mount Sinai, New York, NY
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY
| |
Collapse
|
5
|
Caron DP, Specht WL, Chen D, Wells SB, Szabo PA, Jensen IJ, Farber DL, Sims PA. Multimodal hierarchical classification of CITE-seq data delineates immune cell states across lineages and tissues. CELL REPORTS METHODS 2025; 5:100938. [PMID: 39814026 PMCID: PMC11840950 DOI: 10.1016/j.crmeth.2024.100938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Revised: 08/21/2024] [Accepted: 12/09/2024] [Indexed: 01/18/2025]
Abstract
Single-cell RNA sequencing (scRNA-seq) is invaluable for profiling cellular heterogeneity and transcriptional states, but transcriptomic profiles do not always delineate subsets defined by surface proteins. Cellular indexing of transcriptomes and epitopes (CITE-seq) enables simultaneous profiling of single-cell transcriptomes and surface proteomes; however, accurate cell-type annotation requires a classifier that integrates multimodal data. Here, we describe multimodal classifier hierarchy (MMoCHi), a marker-based approach for accurate cell-type classification across multiple single-cell modalities that does not rely on reference atlases. We benchmark MMoCHi using sorted T lymphocyte subsets and annotate a cross-tissue human immune cell dataset. MMoCHi outperforms leading transcriptome-based classifiers and multimodal unsupervised clustering in its ability to identify immune cell subsets that are not readily resolved and to reveal subset markers. MMoCHi is designed for adaptability and can integrate annotation of cell types and developmental states across diverse lineages, samples, or modalities.
Collapse
Affiliation(s)
- Daniel P Caron
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - William L Specht
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - David Chen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Steven B Wells
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Peter A Szabo
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Isaac J Jensen
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Donna L Farber
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Surgery, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - Peter A Sims
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY 10032, USA; Department of Biochemistry and Molecular Biophysics, Columbia University Irving Medical Center, New York, NY 10032, USA.
| |
Collapse
|
6
|
Tsanakas AT, Mueller YM, van de Werken HJG, Pujol Borrell R, Ouzounis CA, Katsikis PD. An explainable machine learning model for COVID-19 severity prognosis at hospital admission. INFORMATICS IN MEDICINE UNLOCKED 2025; 52:101602. [DOI: 10.1016/j.imu.2024.101602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2025] Open
|
7
|
Letchumanan N, Hanaoka S, Takenaga T, Suzuki Y, Nakao T, Nomura Y, Yoshikawa T, Abe O. Predicting the risk of type 2 diabetes mellitus (T2DM) emergence in 5 years using mammography images: a comparison study between radiomics and deep learning algorithm. J Med Imaging (Bellingham) 2025; 12:014501. [PMID: 39776665 PMCID: PMC11702674 DOI: 10.1117/1.jmi.12.1.014501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Revised: 11/20/2024] [Accepted: 12/01/2024] [Indexed: 01/11/2025] Open
Abstract
Purpose The prevalence of type 2 diabetes mellitus (T2DM) has been steadily increasing over the years. We aim to predict the occurrence of T2DM using mammography images within 5 years using two different methods and compare their performance. Approach We examined 312 samples, including 110 positive cases (developed T2DM after 5 years) and 202 negative cases (did not develop T2DM) using two different methods. In the first method, a radiomics-based approach, we utilized radiomics features and machine learning (ML) algorithms. The entire breast region was chosen as the region of interest for extracting radiomics features. Then, a binary breast image was created from which we extracted 668 features and analyzed them using various ML algorithms. In the second method, a complex convolutional neural network (CNN) with a modified ResNet architecture and various kernel sizes was applied to raw mammography images for the prediction task. A nested, stratified five-fold cross-validation was done for both parts A and B to compute accuracy, sensitivity, specificity, and area under the receiver operating curve (AUROC). Hyperparameter tuning was also done to enhance the model's performance and reliability. Results The radiomics approach's light gradient boosting model gave 68.9% accuracy, 30.7% sensitivity, 89.5% specificity, and 0.63 AUROC. The CNN method achieved an AUROC of 0.58 over 20 epochs. Conclusion Radiomics outperformed CNN by 0.05 in terms of AUROC. This may be due to the more straightforward interpretability and clinical relevance of predefined radiomics features compared with the complex, abstract features learned by CNNs.
Collapse
Affiliation(s)
- Nishta Letchumanan
- The University of Tokyo, Department of Radiology, Graduate School of Medicine, Tokyo, Japan
| | - Shouhei Hanaoka
- The University of Tokyo Hospital, Department of Radiology, Tokyo, Japan
| | - Tomomi Takenaga
- The University of Tokyo Hospital, Department of Radiology, Tokyo, Japan
| | - Yusuke Suzuki
- The University of Tokyo Hospital, Department of Breast and Endocrine Surgery, Tokyo, Japan
| | - Takahiro Nakao
- The University of Tokyo Hospital, Department of Computational Diagnostic Radiology and Preventive Medicine, Tokyo, Japan
| | - Yukihiro Nomura
- The University of Tokyo Hospital, Department of Computational Diagnostic Radiology and Preventive Medicine, Tokyo, Japan
- Chiba University, Center for Frontier Medical Engineering, Chiba, Japan
| | - Takeharu Yoshikawa
- The University of Tokyo Hospital, Department of Computational Diagnostic Radiology and Preventive Medicine, Tokyo, Japan
| | - Osamu Abe
- The University of Tokyo Hospital, Department of Radiology, Tokyo, Japan
| |
Collapse
|
8
|
Levy J, Dimambro M, Diallo A, Gui J, Shiner B, Levis M. Investigating the Differential Impact of Psychosocial Factors by Patient Characteristics and Demographics on Veteran Suicide Risk Through Machine Learning Extraction of Cross-Modal Interactions. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2025; 30:167-184. [PMID: 39670369 PMCID: PMC11747942 DOI: 10.1142/9789819807024_0013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/16/2025]
Abstract
Accurate prediction of suicide risk is crucial for identifying patients with elevated risk burden, helping ensure these patients receive targeted care. The US Department of Veteran Affairs' suicide prediction model primarily leverages structured electronic health records (EHR) data. This approach largely overlooks unstructured EHR, a data format that could be utilized to enhance predictive accuracy. This study aims to enhance suicide risk models' predictive accuracy by developing a model that incorporates both structured EHR predictors and semantic NLP-derived variables from unstructured EHR. XGBoost models were fit to predict suicide risk- the interactions identified by the model were extracted using SHAP, validated using logistic regression models, added to a ridge regression model, which was subsequently compared to a ridge regression approach without the use of interactions. By introducing a selection parameter, α, to balance the influence of structured (α=1) and unstructured (α=0) data, we found that intermediate α values achieved optimal performance across various risk strata, improved model performance of the ridge regression approach and uncovered significant cross-modal interactions between psychosocial constructs and patient characteristics. These interactions highlight how psychosocial risk factors are influenced by individual patient contexts, potentially informing improved risk prediction methods and personalized interventions. Our findings underscore the importance of incorporating nuanced narrative data into predictive models and set the stage for future research that will expand the use of advanced machine learning techniques, including deep learning, to further refine suicide risk prediction methods.
Collapse
Affiliation(s)
- Joshua Levy
- Department of Computational Biomedicine, Cedars Sinai Medical Center Los Angeles, CA, USA,
| | - Monica Dimambro
- White River Junction VA Medical Center, White River Junction, VT, USA,
| | - Alos Diallo
- Dartmouth College Geisel School of Medicine, Hanover, NH, USA,
| | - Jiang Gui
- Dartmouth College Geisel School of Medicine, Hanover, NH, USA,
| | - Brian Shiner
- White River Junction VA Medical Center, White River Junction, VT, USA,
| | - Maxwell Levis
- White River Junction VA Medical Center, White River Junction, VT, USA,
| |
Collapse
|
9
|
Nyalala I, Jiayu Z, Zixuan C, Junlong C, Chen K. Online chicken carcass volume estimation using depth imaging and 3-D reconstruction. Poult Sci 2024; 103:104232. [PMID: 39284266 PMCID: PMC11419819 DOI: 10.1016/j.psj.2024.104232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 08/08/2024] [Accepted: 08/13/2024] [Indexed: 09/27/2024] Open
Abstract
Variability in the size of slaughtered chickens remains a longstanding challenge in the standardization of the poultry industry. To address this issue, we present a novel approach that uses volume as a grading metric for chicken carcasses. This innovative method, unexplored in existing studies, employs real-time data capture of moving chicken carcasses on a production line using Kinect v2 depth imaging and 3-D reconstruction technologies. The captured depth images are processed into point clouds followed by 3-D reconstruction. Volume is calculated from the reconstructed models using the surface integration method, and additional 2-D and 3-D features are extracted as input parameters for machine learning models. Multiple regression models were evaluated, with the bagged tree model demonstrating superior performance, achieving an R² value of 0.9988, RMSE of 5.335, and ARE of 2.125%. Furthermore, our method showed remarkable efficiency with an average processing time of less than 1.6 seconds per carcass. These results indicate that our novel approach fills a critical gap in existing automated grading methodologies by offering both accuracy and efficiency. This validates the applicability of depth imaging, 3-D reconstruction, and machine learning for estimating chicken carcass volume with high precision, thereby enabling a more comprehensive, efficient, and reliable chicken carcass grading system.
Collapse
Affiliation(s)
- Innocent Nyalala
- College of Engineering, Nanjing Agricultural University, Nanjing, Jiangsu, 210031, China; Faculty of Science, Department of Computer Science, Egerton University, Njoro, Kenya
| | - Zhang Jiayu
- College of Engineering, Nanjing Agricultural University, Nanjing, Jiangsu, 210031, China
| | - Chen Zixuan
- Nanjing Institute of Agricultural Mechanization, Ministry of Agriculture and Rural Affairs, Nanjing, Jiangsu, 210014, China
| | - Chen Junlong
- Nanjing Institute of Agricultural Mechanization, Ministry of Agriculture and Rural Affairs, Nanjing, Jiangsu, 210014, China
| | - Kunjie Chen
- College of Engineering, Nanjing Agricultural University, Nanjing, Jiangsu, 210031, China.
| |
Collapse
|
10
|
Safdar S, Jefferson AJ, Costello DM, Blinn A. Urbanization and Suspended Sediment Transport Dynamics: A Comparative Study of Watersheds with Varying Degree of Urbanization Using Concentration-Discharge Hysteresis. ACS ES&T WATER 2024; 4:3904-3917. [PMID: 39296623 PMCID: PMC11407305 DOI: 10.1021/acsestwater.4c00214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Revised: 07/18/2024] [Accepted: 07/19/2024] [Indexed: 09/21/2024]
Abstract
Suspended sediment is a critical water quality parameter and an indicator of geomorphic processes, but suspended sediment dynamics in urban streams may not conform to the first-flush model widely used for other pollutants. We analyzed discharge and turbidity data for 367 events from three urban watersheds (impervious cover 16-45%) in Cleveland, Ohio (USA). Less intensely urbanized watersheds exhibit higher turbidity compared to that of the most highly urbanized watershed. Proportionally, more counterclockwise hysteresis is observed in the two less urbanized watersheds, and more clockwise hysteresis occurs in the highly urbanized watershed. However, hysteresis patterns are driven by different mechanisms in each watershed, and geomorphic analysis was critical to identifying the underlying mechanisms. In the least urbanized watershed, spatial rainfall variability controls sediment hysteresis. In the intermediate watershed, the erosion of upstream weathered shale banks during dry periods plays a significant role in the sediment supply and shaping hysteresis. In the most urbanized watershed, high eroding banks in downstream reaches lead to more frequent clockwise hysteresis. Overall, we suggest that as the impervious surfaces increase, the availability of instream sediments (bed and banks) plays an increased role in suspended sediment dynamics, and geomorphology remains essential for guiding management decisions.
Collapse
Affiliation(s)
- Suffiyan Safdar
- Department of Civil & Environmental Engineering, University of Vermont, 33 Colchester Avenue, Burlington, Vermont 05405, United States
| | - Anne J Jefferson
- Rubenstein School of Environment and Natural Resources, University of Vermont, 81 Carrigan Drive, Burlington, Vermont 05405, United States
| | - David M Costello
- Department of Biological Sciences, Kent State University, 256 Cunningham Hall, Kent Campus, Ohio 44242, United States
| | - Andrew Blinn
- Department of Biological Sciences, Kent State University, 256 Cunningham Hall, Kent Campus, Ohio 44242, United States
| |
Collapse
|
11
|
Zhang R, Zhu H, Chen M, Sang W, Lu K, Li Z, Wang C, Zhang L, Yin FF, Yang Z. A dual-radiomics model for overall survival prediction in early-stage NSCLC patient using pre-treatment CT images. Front Oncol 2024; 14:1419621. [PMID: 39206157 PMCID: PMC11349529 DOI: 10.3389/fonc.2024.1419621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 07/26/2024] [Indexed: 09/04/2024] Open
Abstract
Introduction Radiation therapy (RT) is one of the primary treatment options for early-stage non-small cell lung cancer (ES-NSCLC). Therefore, accurately predicting the overall survival (OS) rate following radiotherapy is crucial for implementing personalized treatment strategies. This work aims to develop a dual-radiomics (DR) model to (1) predict 3-year OS in ES-NSCLC patients receiving RT using pre-treatment CT images, and (2) provide explanations between feature importanceand model prediction performance. Methods The publicly available TCIA Lung1 dataset with 132 ES-NSCLC patients received RT were studied: 89/43 patients in the under/over 3-year OS group. For each patient, two types of radiomic features were examined: 56 handcrafted radiomic features (HRFs) extracted within gross tumor volume, and 512 image deep features (IDFs) extracted using a pre-trained U-Net encoder. They were combined as inputs to an explainable boosting machine (EBM) model for OS prediction. The EBM's mean absolute scores for HRFs and IDFs were used as feature importance explanations. To evaluate identified feature importance, the DR model was compared with EBM using either (1) key or (2) non-key feature type only. Comparison studies with other models, including supporting vector machine (SVM) and random forest (RF), were also included. The performance was evaluated by the area under the receiver operating characteristic curve (AUCROC), accuracy, sensitivity, and specificity with a 100-fold Monte Carlo cross-validation. Results The DR model showed highestperformance in predicting 3-year OS (AUCROC=0.81 ± 0.04), and EBM scores suggested that IDFs showed significantly greater importance (normalized mean score=0.0019) than HRFs (score=0.0008). The comparison studies showed that EBM with key feature type (IDFs-only demonstrated comparable AUCROC results (0.81 ± 0.04), while EBM with non-key feature type (HRFs-only) showed limited AUCROC (0.64 ± 0.10). The results suggested that feature importance score identified by EBM is highly correlated with OS prediction performance. Both SVM and RF models were unable to explain key feature type while showing limited overall AUCROC=0.66 ± 0.07 and 0.77 ± 0.06, respectively. Accuracy, sensitivity, and specificity showed a similar trend. Discussion In conclusion, a DR model was successfully developed to predict ES-NSCLC OS based on pre-treatment CT images. The results suggested that the feature importance from DR model is highly correlated to the model prediction power.
Collapse
Affiliation(s)
- Rihui Zhang
- Medical Physics Graduate Program, Duke Kunshan University, Kunshan, Jiangsu, China
| | - Haiming Zhu
- Medical Physics Graduate Program, Duke Kunshan University, Kunshan, Jiangsu, China
| | - Minbin Chen
- Department of Radiotherapy & Oncology, The First People’s Hospital of Kunshan, Kunshan, Jiangsu, China
| | - Weiwei Sang
- Medical Physics Graduate Program, Duke Kunshan University, Kunshan, Jiangsu, China
| | - Ke Lu
- Deparment of Radiation Oncology, Duke University, Durham, NC, United States
| | - Zhen Li
- Radiation Oncology Department, Shanghai Sixth People’s Hospital, Shanghai, China
| | - Chunhao Wang
- Deparment of Radiation Oncology, Duke University, Durham, NC, United States
| | - Lei Zhang
- Medical Physics Graduate Program, Duke Kunshan University, Kunshan, Jiangsu, China
| | - Fang-Fang Yin
- Medical Physics Graduate Program, Duke Kunshan University, Kunshan, Jiangsu, China
| | - Zhenyu Yang
- Medical Physics Graduate Program, Duke Kunshan University, Kunshan, Jiangsu, China
| |
Collapse
|
12
|
Mullen AD, Armstrong SE, Talbert J, Bumgardner VKC. CLASSify: A Web-Based Tool for Machine Learning. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2024; 2024:364-373. [PMID: 38827105 PMCID: PMC11141843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Machine learning classification problems are widespread in bioinformatics, but the technical knowledge required to perform model training, optimization, and inference can prevent researchers from utilizing this technology. This article presents an automated tool for machine learning classification problems to simplify the process of training models and producing results while providing informative visualizations and insights into the data. This tool supports both binary and multiclass classification problems, and it provides access to a variety of models and methods. Synthetic data can be generated within the interface to fill missing values, balance class labels, or generate entirely new datasets. It also provides support for feature evaluation and generates explainability scores to indicate which features influence the output the most. We present CLASSify, an open-source tool for simplifying the user experience of solving classification problems without the need for knowledge of machine learning.
Collapse
|
13
|
Li Y, Logan N, Quinn B, Hong Y, Birse N, Zhu H, Haughey S, Elliott CT, Wu D. Fingerprinting black tea: When spectroscopy meets machine learning a novel workflow for geographical origin identification. Food Chem 2024; 438:138029. [PMID: 38006696 DOI: 10.1016/j.foodchem.2023.138029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 10/29/2023] [Accepted: 11/14/2023] [Indexed: 11/27/2023]
Abstract
Food fraud, along with many challenges to the integrity and sustainability, threatens the prosperity of businesses and society as a whole. Tea is the second most commonly consumed non-alcoholic beverage globally. Challenges to tea authenticity require the development of highly efficient and rapid solutions to improve supply chain transparency. This study has produced an innovative workflow for black tea geographical indications (GI) discrimination based on non-targeted spectroscopic fingerprinting techniques. A total of 360 samples originating from nine GI regions worldwide were analysed by Fourier Transform Infrared (FTIR) and Near Infrared spectroscopy. Machine learning algorithms (k-nearest neighbours and support vector machine models) applied to the test data greatly improved the GI identification achieving 100% accuracy using FTIR. This workflow will provide a low-cost and user-friendly solution for on-site and real-time determination of black tea geographical origin along supply chains.
Collapse
Affiliation(s)
- Yicong Li
- National Measurement Laboratory: Centre of Excellence in Agriculture and Food Integrity, Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, Northern Ireland BT9 5DL, UK
| | - Natasha Logan
- National Measurement Laboratory: Centre of Excellence in Agriculture and Food Integrity, Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, Northern Ireland BT9 5DL, UK
| | - Brian Quinn
- National Measurement Laboratory: Centre of Excellence in Agriculture and Food Integrity, Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, Northern Ireland BT9 5DL, UK
| | - Yunhe Hong
- National Measurement Laboratory: Centre of Excellence in Agriculture and Food Integrity, Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, Northern Ireland BT9 5DL, UK
| | - Nicholas Birse
- National Measurement Laboratory: Centre of Excellence in Agriculture and Food Integrity, Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, Northern Ireland BT9 5DL, UK
| | - Hao Zhu
- National Measurement Laboratory: Centre of Excellence in Agriculture and Food Integrity, Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, Northern Ireland BT9 5DL, UK
| | - Simon Haughey
- National Measurement Laboratory: Centre of Excellence in Agriculture and Food Integrity, Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, Northern Ireland BT9 5DL, UK
| | - Christopher T Elliott
- National Measurement Laboratory: Centre of Excellence in Agriculture and Food Integrity, Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, Northern Ireland BT9 5DL, UK; School of Food Science and Technology, Faculty of Science and Technology, Thammasat University (Rangsit Campus), Khlong Luang, Pathum Thani 12120, Thailand
| | - Di Wu
- National Measurement Laboratory: Centre of Excellence in Agriculture and Food Integrity, Institute for Global Food Security, School of Biological Sciences, Queen's University Belfast, 19 Chlorine Gardens, Belfast, Northern Ireland BT9 5DL, UK.
| |
Collapse
|
14
|
Caron DP, Specht WL, Chen D, Wells SB, Szabo PA, Jensen IJ, Farber DL, Sims PA. Multimodal hierarchical classification of CITE-seq data delineates immune cell states across lineages and tissues. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.07.06.547944. [PMID: 37461466 PMCID: PMC10350048 DOI: 10.1101/2023.07.06.547944] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 07/27/2023]
Abstract
Single-cell RNA sequencing (scRNA-seq) is invaluable for profiling cellular heterogeneity and dissecting transcriptional states, but transcriptomic profiles do not always delineate subsets defined by surface proteins, as in cells of the immune system. Cellular Indexing of Transcriptomes and Epitopes (CITE-seq) enables simultaneous profiling of single-cell transcriptomes and surface proteomes; however, accurate cell type annotation requires a classifier that integrates multimodal data. Here, we describe MultiModal Classifier Hierarchy (MMoCHi), a marker-based approach for classification, reconciling gene and protein expression without reliance on reference atlases. We benchmark MMoCHi using sorted T lymphocyte subsets and annotate a cross-tissue human immune cell dataset. MMoCHi outperforms leading transcriptome-based classifiers and multimodal unsupervised clustering in its ability to identify immune cell subsets that are not readily resolved and to reveal novel subset markers. MMoCHi is designed for adaptability and can integrate annotation of cell types and developmental states across diverse lineages, samples, or modalities.
Collapse
Affiliation(s)
- Daniel P. Caron
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY, USA
| | - William L. Specht
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY, USA
| | - David Chen
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Steven B. Wells
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
| | - Peter A. Szabo
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY, USA
| | - Isaac J. Jensen
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY, USA
| | - Donna L. Farber
- Department of Microbiology and Immunology, Columbia University Irving Medical Center, New York, NY, USA
- Department of Surgery, Columbia University Irving Medical Center, New York, NY, USA
| | - Peter A. Sims
- Department of Systems Biology, Columbia University Irving Medical Center, New York, NY, USA
- Department of Biochemistry and Molecular Biophysics, Columbia University Irving Medical Center, New York, NY, USA
| |
Collapse
|
15
|
Nazari E, Naderi H, Tabadkani M, ArefNezhad R, Farzin AH, Dashtiahangar M, Khazaei M, Ferns GA, Mehrabian A, Tabesh H, Avan A. Breast cancer prediction using different machine learning methods applying multi factors. J Cancer Res Clin Oncol 2023; 149:17133-17146. [PMID: 37773467 DOI: 10.1007/s00432-023-05388-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 09/01/2023] [Indexed: 10/01/2023]
Abstract
OBJECTIVE Breast cancer (BC) is a multifactorial disease and is one of the most common cancers globally. This study aimed to compare different machine learning (ML) techniques to develop a comprehensive breast cancer risk prediction model based on features of various factors. METHODS The population sample contained 810 records (115 cancer patients and 695 healthy individuals). 45 attributes out of 85 were selected based on the opinion of experts. These selected attributes are in genetic, biochemical, biomarker, gender, demographic and pathological factors. 13 Machine learning models were trained with proposed attributes and coefficient of attributes and internal relationships were calculated. RESULT Compared to other methods random forest (RF) has higher performance (accuracy 99.26%, precision 99%, and area under the curve (AUC) 99%). The results of assessing the impact and correlation of variables using the RF method based on PCA indicated that pathology, biomarker, biochemistry, gene, and demographic factors with a coefficient of 0.35, 0.23, 0.15, 0.14, and 0.13 respectively, affected the risk of BC (r2 = 0.54). CONCLUSION Breast cancer has several risk factors. Medical experts use these risk factors for early diagnosis. Therefore, identifying related risk factors and their effect can increase the accuracy of diagnosis. Considering the broad features for predicting breast cancer leads to the development of a comprehensive prediction model. In this study, using RF technique a breast cancer prediction model with 99.3% accuracy was developed based on multifactorial features.
Collapse
Affiliation(s)
- Elham Nazari
- Faculty of Medicine, Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
- Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Hamid Naderi
- Faculty of Medicine, Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mahla Tabadkani
- Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Reza ArefNezhad
- Halal Research Center of IRI, FDA, Tehran, Iran
- Department of Anatomy, School of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| | | | | | - Majid Khazaei
- Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Division of Medical Education, Brighton & Sussex Medical School, Falmer, Brighton, BN1 9PH, Sussex, UK
| | - Amin Mehrabian
- Warwick Medical School, University of Warwick, Coventry, UK
| | - Hamed Tabesh
- Faculty of Medicine, Department of Medical Informatics, Mashhad University of Medical Sciences, Mashhad, Iran.
| | - Amir Avan
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
- Faculty of Health, School of Biomedical Sciences, Queensland University of Technology, Brisbane, QLD, Australia.
- College of Medicine, University of Warith Al-Anbiyaa, Karbala, Iraq.
| |
Collapse
|
16
|
Liu Y, Li J, Yunkui Pang, Nie D, Yap PT. The Devil is in the Upsampling: Architectural Decisions Made Simpler for Denoising with Deep Image Prior. PROCEEDINGS. IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION 2023; 2023:12374-12383. [PMID: 38726039 PMCID: PMC11078028 DOI: 10.1109/iccv51070.2023.01140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2024]
Abstract
Deep Image Prior (DIP) shows that some network architectures inherently tend towards generating smooth images while resisting noise, a phenomenon known as spectral bias. Image denoising is a natural application of this property. Although denoising with DIP mitigates the need for large training sets, two often intertwined practical challenges need to be overcome: architectural design and noise fitting. Existing methods either handcraft or search for suitable architectures from a vast design space, due to the limited understanding of how architectural choices affect the denoising outcome. In this study, we demonstrate from a frequency perspective that unlearnt upsampling is the main driving force behind the denoising phenomenon with DIP. This finding leads to straightforward strategies for identifying a suitable architecture for every image without laborious search. Extensive experiments show that the estimated architectures achieve superior denoising results than existing methods with up to 95% fewer parameters. Thanks to this under-parameterization, the resulting architectures are less prone to noise-fitting.
Collapse
Affiliation(s)
- Yilin Liu
- University of North Carolina at Chapel Hill
| | - Jiang Li
- University of North Carolina at Chapel Hill
| | | | - Dong Nie
- University of North Carolina at Chapel Hill
| | | |
Collapse
|
17
|
Kong X, Lin K, Wu G, Tao X, Zhai X, Lv L, Dong D, Zhu Y, Yang S. Machine Learning Techniques Applied to the Study of Drug Transporters. Molecules 2023; 28:5936. [PMID: 37630188 PMCID: PMC10459831 DOI: 10.3390/molecules28165936] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 07/27/2023] [Accepted: 08/02/2023] [Indexed: 08/27/2023] Open
Abstract
With the advancement of computer technology, machine learning-based artificial intelligence technology has been increasingly integrated and applied in the fields of medicine, biology, and pharmacy, thereby facilitating their development. Transporters have important roles in influencing drug resistance, drug-drug interactions, and tissue-specific drug targeting. The investigation of drug transporter substrates and inhibitors is a crucial aspect of pharmaceutical development. However, long duration and high expenses pose significant challenges in the investigation of drug transporters. In this review, we discuss the present situation and challenges encountered in applying machine learning techniques to investigate drug transporters. The transporters involved include ABC transporters (P-gp, BCRP, MRPs, and BSEP) and SLC transporters (OAT, OATP, OCT, MATE1,2-K, and NET). The aim is to offer a point of reference for and assistance with the progression of drug transporter research, as well as the advancement of more efficient computer technology. Machine learning methods are valuable and attractive for helping with the study of drug transporter substrates and inhibitors, but continuous efforts are still needed to develop more accurate and reliable predictive models and to apply them in the screening process of drug development to improve efficiency and success rates.
Collapse
Affiliation(s)
- Xiaorui Kong
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Kexin Lin
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Gaolei Wu
- Department of Pharmacy, Dalian Women and Children’s Medical Group, Dalian 116024, China;
| | - Xufeng Tao
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Xiaohan Zhai
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Linlin Lv
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Deshi Dong
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Yanna Zhu
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| | - Shilei Yang
- Department of Pharmacy, First Affiliated Hospital of Dalian Medical University, Dalian 116011, China; (X.K.); (K.L.); (X.T.); (X.Z.); (L.L.); (D.D.)
| |
Collapse
|
18
|
Sahour S, Khanbeyki M, Gholami V, Sahour H, Kahvazade I, Karimi H. Evaluation of machine learning algorithms for groundwater quality modeling. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:46004-46021. [PMID: 36715809 DOI: 10.1007/s11356-023-25596-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 01/24/2023] [Indexed: 06/18/2023]
Abstract
Groundwater quality is typically measured through water sampling and lab analysis. The field-based measurements are costly and time-consuming when applied over a large domain. In this study, we developed a machine learning-based framework to map groundwater quality in an unconfined aquifer in the north of Iran. Groundwater samples were provided from 248 monitoring wells across the region. The groundwater quality index (GWQI) in each well was measured and classified into four classes: very poor, poor, good, and excellent, according to their cut-off values. Factors affecting groundwater quality, including distance to industrial centers, distance to residential areas, population density, aquifer transmissivity, precipitation, evaporation, geology, and elevation, were identified and prepared in the GIS environment. Six machine learning classifiers, including extreme gradient boosting (XGB), random forest (RF), support vector machine (SVM), artificial neural networks (ANN), k-nearest neighbor (KNN), and Gaussian classifier model (GCM), were used to establish relationships between GWQI and its controlling factors. The algorithms were evaluated using the receiver operating characteristic curve (ROC) and statistical efficiencies (overall accuracy, precision, recall, and F-1 score). Accuracy assessment showed that ML algorithms provided high accuracy in predicting groundwater quality. However, RF was selected as the optimum model given its higher accuracy (overall accuracy, precision, and recall = 0.92; ROC = 0.95). The trained RF model was used to map GWQI classes across the entire region. Results showed that the poor GWQI class is dominant in the study area (covering 66% of the study area), followed by good (19% of the area), very poor (14% of the area), and excellent (< 1% of the area) classes. An area of very poor GWQI was observed in the north. Feature analysis indicated that the distance to industrial locations is the main factor affecting groundwater quality in the region. The study provides a cost-effective methodology in groundwater quality modeling that can be duplicated in other regions with similar hydrological and geological settings.
Collapse
Affiliation(s)
| | - Matin Khanbeyki
- Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Vahid Gholami
- Department of Range and Watershed Management and Dept. of Water Eng. and Environment, Faculty of Natural Resources, University of Guilan, Sowmeh Sara 1144, Guilan, Iran.
| | - Hossein Sahour
- Department of Geological and Environmental Sciences, Western Michigan University, Kalamazoo, MI, 49008, USA
| | - Irene Kahvazade
- Department of Computer Sciences, Western Michigan University, Kalamazoo, MI, 49008, USA
| | - Hadi Karimi
- Department of Geological and Environmental Sciences, Western Michigan University, Kalamazoo, MI, 49008, USA
| |
Collapse
|
19
|
Bukhbinder AS, Hinojosa M, Harris K, Li X, Farrell CM, Shyer M, Goodwin N, Anjum S, Hasan O, Cooper S, Sciba L, Vargas A, Hunter DH, Ortiz GJ, Chung K, Cui L, Zhang GQ, Fisher-Hoch SP, McCormick JB, Schulz PE. Population-Based Mini-Mental State Examination Norms in Adults of Mexican Heritage in the Cameron County Hispanic Cohort. J Alzheimers Dis 2023; 92:1323-1339. [PMID: 36872776 DOI: 10.3233/jad-220934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
BACKGROUND Accurately identifying cognitive changes in Mexican American (MA) adults using the Mini-Mental State Examination (MMSE) requires knowledge of population-based norms for the MMSE, a scale which has widespread use in research settings. OBJECTIVE To describe the distribution of MMSE scores in a large cohort of MA adults, assess the impact of MMSE requirements on their clinical trial eligibility, and explore which factors are most strongly associated with their MMSE scores. METHODS Visits between 2004-2021 in the Cameron County Hispanic Cohort were analyzed. Eligible participants were ≥18 years old and of Mexican descent. MMSE distributions before and after stratification by age and years of education (YOE) were assessed, as was the proportion of trial-aged (50-85- year-old) participants with MMSE <24, a minimum MMSE cutoff most frequently used in Alzheimer's disease (AD) clinical trials. As a secondary analysis, random forest models were constructed to estimate the relative association of the MMSE with potentially relevant variables. RESULTS The mean age of the sample set (n = 3,404) was 44.4 (SD, 16.0) years old and 64.5% female. Median MMSE was 28 (IQR, 28-29). The percentage of trial-aged participants (n = 1,267) with MMSE <24 was 18.6%; 54.3% among the subset with 0-4 YOE (n = 230). The five variables most associated with the MMSE in the study sample were education, age, exercise, C-reactive protein, and anxiety. CONCLUSION The minimum MMSE cutoffs in most phase III prodromal-to-mild AD trials would exclude a significant proportion of trial-aged participants in this MA cohort, including over half of those with 0-4 YOE.
Collapse
Affiliation(s)
- Avram S Bukhbinder
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA.,Division of Pediatric Neurology, Massachusetts General Hospital, Boston, MA
| | - Miriam Hinojosa
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Kristofer Harris
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Xiaojin Li
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Christine M Farrell
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Madison Shyer
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nathan Goodwin
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Sahar Anjum
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Omar Hasan
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Susan Cooper
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Lois Sciba
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Amanda Vargas
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - David H Hunter
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Guadalupe J Ortiz
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Karen Chung
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Licong Cui
- School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Guo-Qiang Zhang
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA.,School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Susan P Fisher-Hoch
- Department of Epidemiology, Human Genetics & Environmental Sciences, UTHealth School of Public Health, Brownsville, TX, USA
| | - Joseph B McCormick
- Department of Epidemiology, Human Genetics & Environmental Sciences, UTHealth School of Public Health, Brownsville, TX, USA
| | - Paul E Schulz
- Department of Neurology, McGovern Medical School, University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
20
|
Ali Z, Alturise F, Alkhalifah T, Khan YD. IGPred-HDnet: Prediction of Immunoglobulin Proteins Using Graphical Features and the Hierarchal Deep Learning-Based Approach. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023; 2023:2465414. [PMID: 36744119 PMCID: PMC9891831 DOI: 10.1155/2023/2465414] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/16/2022] [Accepted: 10/12/2022] [Indexed: 01/26/2023]
Abstract
Motivation. Immunoglobulin proteins (IGP) (also called antibodies) are glycoproteins that act as B-cell receptors against external or internal antigens like viruses and bacteria. IGPs play a significant role in diverse cellular processes ranging from adhesion to cell recognition. IGP identifications via the in-silico approach are faster and more cost-effective than wet-lab technological methods. Methods. In this study, we developed an intelligent theoretical deep learning framework, "IGPred-HDnet" for the discrimination of IGPs and non-IGPs. Three types of promising descriptors are feature extraction based on graphical and statistical features (FEGS), amphiphilic pseudo-amino acid composition (Amp-PseAAC), and dipeptide composition (DPC) to extract the graphical, physicochemical, and sequential features. Next, the extracted attributes are evaluated through machine learning, i.e., decision tree (DT), support vector machine (SVM), k-nearest neighbour (KNN), and hierarchical deep network (HDnet) classifiers. The proposed predictor IGPred-HDnet was trained and tested using a 10-fold cross-validation and independent test. Results and Conclusion. The success rates in terms of accuracy (ACC) and Matthew's correlation coefficient (MCC) of IGPred-HDnet on training and independent dataset (Dtrain Dtest) are ACC = 98.00%, 99.10%, and MCC = 0.958, and 0.980 points, respectively. The empirical outcomes demonstrate that the IGPred-HDnet model efficacy on both datasets using the novel FEGS feature and HDnet algorithm achieved superior predictions to other existing computational models. We hope this research will provide great insights into the large-scale identification of IGPs and pharmaceutical companies in new drug design.
Collapse
Affiliation(s)
- Zakir Ali
- Department of Computer Science, School of Science and Technology, University of Management and Technology, Lahore, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Science and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
21
|
Alcañiz A, Lindfors AV, Zeman M, Ziar H, Isabella O. Effect of Climate on Photovoltaic Yield Prediction Using Machine Learning Models. GLOBAL CHALLENGES (HOBOKEN, NJ) 2023; 7:2200166. [PMID: 36618102 PMCID: PMC9818063 DOI: 10.1002/gch2.202200166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 09/28/2022] [Indexed: 06/17/2023]
Abstract
Machine learning is arising as a major solution for the photovoltaic (PV) power prediction. Despite the abundant literature, the effect of climate on yield predictions using machine learning is unknown. This work aims to find climatic trends by predicting the power of 48 PV systems around the world, equally divided into four climates. An extensive data gathering process is performed and open-data sources are prioritized. A website www.tudelft.nl/open-source-pv-power-databases has been created with all found open data sources for future research. Five machine learning algorithms and a baseline one have been trained for each PV system. Results show that the performance ranking of the algorithms is independent of climate. Systems in dry climates depict on average the lowest Normalized Root Mean Squared Error (NRMSE) of 47.6 %, while those in tropical present the highest of 60.2 %. In mild and continental climates the NRMSE is 51.6 % and 54.5 %, respectively. When using a model trained in one climate to predict the power of a system located in another climate, on average systems located in cold climates show a lower generalization error, with an additional NRMSE as low as 5.6 % depending on the climate of the test set. Robustness evaluations were also conducted that increase the validity of the results.
Collapse
Affiliation(s)
- Alba Alcañiz
- Photovoltaic Materials and Devices GroupDelft University of TechnologyMekelweg 4Delft2628 CDThe Netherlands
| | - Anders V. Lindfors
- Finnish Meteorological InstituteMeteorological ResearchErik Palménin aukio 1Helsinki00560Finland
| | - Miro Zeman
- Photovoltaic Materials and Devices GroupDelft University of TechnologyMekelweg 4Delft2628 CDThe Netherlands
| | - Hesan Ziar
- Photovoltaic Materials and Devices GroupDelft University of TechnologyMekelweg 4Delft2628 CDThe Netherlands
| | - Olindo Isabella
- Photovoltaic Materials and Devices GroupDelft University of TechnologyMekelweg 4Delft2628 CDThe Netherlands
| |
Collapse
|
22
|
Mlambo F, Chironda C, George J. Risk Stratification of COVID-19 Using Routine Laboratory Tests: A Machine Learning Approach. Infect Dis Rep 2022; 14:900-931. [PMID: 36412748 PMCID: PMC9680361 DOI: 10.3390/idr14060090] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 11/08/2022] [Accepted: 11/09/2022] [Indexed: 11/22/2022] Open
Abstract
The COVID-19 pandemic placed significant stress on an already overburdened health system. The diagnosis was based on detection of a positive RT-PCR test, which may be delayed when there is peak demand for testing. Rapid risk stratification of high-risk patients allows for the prioritization of resources for patient care. The study aims were to classify patients as severe or not severe based on outcomes using machine learning on routine laboratory tests. Data were extracted for all individuals who had at least one SARS-CoV-2 PCR test conducted via the NHLS between the periods of 1 March 2020 to 7 July 2020. Exclusion criteria: those 18 years, and those with indeterminate PCR tests. Results for 15437 patients (3301 positive and 12,136 negative) were used to fit six machine learning models, namely the logistic regression (LR) (the base model), decision trees (DT), random forest (RF), extreme gradient boosting (XGB), convolutional neural network (CNN) and self-normalising neural network (SNN). Model development was carried out by splitting the data into training and testing set of a ratio 70:30, together with a 10-fold cross-validation re-sampling technique. For risk stratification, admission to high care or ICU was the outcome for severe disease. Performance of the models varied: sensitivity was best for RF at 75% and accuracy of 75% for CNN. The area under the curve ranged from 57% for CNN to 75% for RF. RF and SNN were the best-performing models. Machine Learning (ML) can be incorporated into the laboratory information system and offers promise for early identification and risk stratification of COVID-19 patients, particularly in areas of resource-poor settings.
Collapse
Affiliation(s)
- Farai Mlambo
- School of Statistics and Actuarial Science, University of the Witwatersrand, 1 Jan Smuts Ave, Braamfontein, Johannesburg 2000, South Africa
| | - Cyril Chironda
- School of Statistics and Actuarial Science, University of the Witwatersrand, 1 Jan Smuts Ave, Braamfontein, Johannesburg 2000, South Africa
| | - Jaya George
- Department of Chemical Pathology, University of Witwatersrand, 29 Princess of Wales Terrace, Parktown, Johannesburg 2193, South Africa
- National Health Laboratory Services of South Africa, 1 Modderfontein Road, Sandringham, Johannesburg 2131, South Africa
| |
Collapse
|
23
|
Verma D, Jansen D, Bach K, Poel M, Mork PJ, d’Hollosy WON. Exploratory application of machine learning methods on patient reported data in the development of supervised models for predicting outcomes. BMC Med Inform Decis Mak 2022; 22:227. [PMID: 36050726 PMCID: PMC9434943 DOI: 10.1186/s12911-022-01973-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 08/22/2022] [Indexed: 01/19/2023] Open
Abstract
BACKGROUND Patient-reported outcome measurements (PROMs) are commonly used in clinical practice to support clinical decision making. However, few studies have investigated machine learning methods for predicting PROMs outcomes and thereby support clinical decision making. OBJECTIVE This study investigates to what extent different machine learning methods, applied to two different PROMs datasets, can predict outcomes among patients with non-specific neck and/or low back pain. METHODS Using two datasets consisting of PROMs from (1) care-seeking low back pain patients in primary care who participated in a randomized controlled trial, and (2) patients with neck and/or low back pain referred to multidisciplinary biopsychosocial rehabilitation, we present data science methods for data prepossessing and evaluate selected regression and classification methods for predicting patient outcomes. RESULTS The results show that there is a potential for machine learning to predict and classify PROMs. The prediction models based on baseline measurements perform well, and the number of predictors can be reduced, which is an advantage for implementation in decision support scenarios. The classification task shows that the dataset does not contain all necessary predictors for the care type classification. Overall, the work presents generalizable machine learning pipelines that can be adapted to other PROMs datasets. CONCLUSION This study demonstrates the potential of PROMs in predicting short-term patient outcomes. Our results indicate that machine learning methods can be used to exploit the predictive value of PROMs and thereby support clinical decision making, given that the PROMs hold enough predictive power.
Collapse
Affiliation(s)
- Deepika Verma
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
| | - Duncan Jansen
- Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Twente, The Netherlands
| | - Kerstin Bach
- Department of Computer Science, Norwegian University of Science and Technology, Trondheim, Norway
| | - Mannes Poel
- Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Twente, The Netherlands
| | - Paul Jarle Mork
- Department of Public Health and Nursing, Norwegian University of Science and Technology, Trondheim, Norway
| | - Wendy Oude Nijeweme d’Hollosy
- Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Twente, The Netherlands
- eHealth Cluster, Roessingh Research and Development, Enschede, The Netherlands
| |
Collapse
|
24
|
Chen Q, Wang Y, Liu Y, Xi B. ESRRG, ATP4A, and ATP4B as Diagnostic Biomarkers for Gastric Cancer: A Bioinformatic Analysis Based on Machine Learning. Front Physiol 2022; 13:905523. [PMID: 35812327 PMCID: PMC9262247 DOI: 10.3389/fphys.2022.905523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Accepted: 05/10/2022] [Indexed: 11/13/2022] Open
Abstract
Based on multiple bioinformatics methods and machine learning techniques, this study was designed to explore potential hub genes of gastric cancer with a diagnostic value. The novel biomarkers were detected through multiple databases of gastric cancer–related genes. The NCBI Gene Expression Omnibus (GEO) database was used to obtain gene expression files. Three hub genes (ESRRG, ATP4A, and ATP4B) were detected through a combination of weighted gene co-expression network analysis (WGCNA), gene–gene interaction network analysis, and supervised feature selection method. GEPIA2 was used to verify the differences in the expression levels of the hub genes in normal and cancer tissues in the RNA-seq levels of Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) databases. The objectivity of potential hub genes was also verified by immunohistochemistry in the Human Protein Atlas (HPA) database and transcription factor–hub gene regulatory network. Machine learning (ML) methods including data pre-processing, model selection and cross-validation, and performance evaluation were examined on the hub-gene expression profiles in five Gene Expression Omnibus datasets and verified on a GEO external validation (EV) dataset. Six supervised learning models (support vector machine, random forest, k-nearest neighbors, neural network, decision tree, and eXtreme Gradient Boosting) and one semi-supervised learning model (label spreading) were established to evaluate the diagnostic value of biomarkers. Among the six supervised models, the support vector machine (SVM) algorithm was the most effective one according to calculated performance metrics, including 0.93 and 0.99 area under the curve (AUC) scores on the test and external validation datasets, respectively. Furthermore, the semi-supervised model could also successfully learn and predict sample types, achieving a 0.986 AUC score on the EV dataset, even when 10% samples in the five GEO datasets were labeled. In conclusion, three hub genes (ATP4A, ATP4B, and ESRRG) closely related to gastric cancer were mined, based on which the ML diagnostic model of gastric cancer was conducted.
Collapse
Affiliation(s)
- Qiu Chen
- Medical College, Yangzhou University, Yangzhou, China
| | - Yu Wang
- College of Physics Science and Technology, Yangzhou University, Yangzhou, China
| | - Yongjun Liu
- College of Physics Science and Technology, Yangzhou University, Yangzhou, China
| | - Bin Xi
- College of Physics Science and Technology, Yangzhou University, Yangzhou, China
- *Correspondence: Bin Xi,
| |
Collapse
|
25
|
Ferjani I, Ali Alsaif S. How to get best predictions for road monitoring using machine learning techniques. PeerJ Comput Sci 2022; 8:e941. [PMID: 35494874 PMCID: PMC9044339 DOI: 10.7717/peerj-cs.941] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2021] [Accepted: 03/14/2022] [Indexed: 06/14/2023]
Abstract
Road condition monitoring is essential for improving traffic safety and reducing accidents. Machine learning methods have recently gained prominence in the practically important task of controlling road surface quality. Several systems have been proposed using sensors, especially accelerometers present in smartphones due to their availability and low cost. However, these methods require practitioners to specify an exact set of features from all the sensors to provide more accurate results, including the time, frequency, and wavelet-domain signal features. It is important to know the effect of these features change on machine learning model performance in handling road anomalies classification tasks. Thus, we address such a problem by conducting a sensitivity analysis of three machine learning models which are Support Vector Machine, Decision Tree, and Multi-Layer Perceptron to test the effectiveness of the model by selecting features. We built a feature vector from all three axes of the sensors that boosts classification performance. Our proposed approach achieved an overall accuracy of 94% on four types of road anomalies. To allow an objective analysis of different features, we used available accelerometer datasets. Our objective is to achieve a good classification performance of road anomalies by distinguishing between significant and relatively insignificant features. Our chosen baseline machine learning models are based on their comparative simplicity and powerful empirical performance. The extensive analysis results of our study provide practical advice for practitioners wishing to select features effectively in real-world settings for road anomalies detection.
Collapse
Affiliation(s)
- Imen Ferjani
- Computer Department, Deanship of Preparatory Year and Supporting Studies, Imam Abdulrahman Bin Faisal University, Dammam, Kingdom of Saudi Arabia
| | - Suleiman Ali Alsaif
- Computer Department, Deanship of Preparatory Year and Supporting Studies, Imam Abdulrahman Bin Faisal University, Dammam, Kingdom of Saudi Arabia
| |
Collapse
|
26
|
Fan X, Huang X, Zhao Y, Wang L, Yu H, Zhao G. Predicting Prognostic Effects of Acupuncture for Depression Using the Electroencephalogram. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE : ECAM 2022; 2022:1381683. [PMID: 35280515 PMCID: PMC8906952 DOI: 10.1155/2022/1381683] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Accepted: 02/07/2022] [Indexed: 11/17/2022]
Abstract
Depression is considered to be a major public health problem with significant implications for individuals and society. Patients with depression can be with complementary therapies such as acupuncture. Predicting the prognostic effects of acupuncture has a big significance in helping physicians make early interventions for patients with depression and avoid malignant events. In this work, a novel framework of predicting prognostic effects of acupuncture for depression based on electroencephalogram (EEG) recordings is presented. Specifically, EEG, as a widely used measurement to evaluate the therapeutic effects of acupuncture, is utilized for predicting prognostic effects of acupuncture. Max-relevance and min-redundancy (mRMR), with merits of removing redundant information among selected features and remaining high relevance between selected features and response variable, is employed to select important lead-rhythm features extracted from EEG recordings. Then, according to the subject Hamilton Depression Rating Scale (HAMD) scores before and after acupuncture for eight weeks, the reduction rate of HAMD score is calculated as a measure of the prognostic effects of acupuncture. Finally, five widely used machine learning methods are utilized for building the predicting models of prognostic effects of acupuncture for depression. Experimental results show that nonlinear machine learning methods have better performance than linear ones on predicting prognostic effects of acupuncture using EEG recordings. Especially, the support vector machine with Gaussian kernel (SVM-RBF) can achieve the best and most stable performance using the mRMR with both evaluating criteria of FCD and FCQ for feature selection. Both mRMR-FCD and mRMR-FCQ obtain the same best performance, where the accuracy and F 1 score are 84.61% and 86.67%, respectively. Moreover, lead-rhythm features selected by mRMR-FCD and mRMR-FCQ are analyzed. The top seven selected lead-rhythm features have much higher mRMR evaluating scores, which guarantee the good predicting performance for machine learning methods to some degree. The presented framework in this work is effective in predicting the prognostic effects of acupuncture for depression. It can be integrated into an intelligent medical system and provide information on the prognostic effects of acupuncture for physicians. Informed prognostic effects of acupuncture for depression in advance and taking interventions can greatly reduce the risk of malignant events for patients with mental disorders.
Collapse
Affiliation(s)
- Xiaomao Fan
- School of Computer Science, South China Normal University, Guangzhou, China
| | - Xingxian Huang
- Department of Acupuncture and Moxibustion, Shenzhen Traditional Chinese Medicine Hospital, Shenzhen, China
| | - Yang Zhao
- School of Data Science, City University of Hong Kong, Hong Kong SAR, China
| | - Lin Wang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Beijing, China
| | - Haibo Yu
- Department of Acupuncture and Moxibustion, Shenzhen Traditional Chinese Medicine Hospital, Shenzhen, China
| | - Gansen Zhao
- School of Computer Science, South China Normal University, Guangzhou, China
| |
Collapse
|
27
|
Ray A. Machine learning in postgenomic biology and personalized medicine. WILEY INTERDISCIPLINARY REVIEWS. DATA MINING AND KNOWLEDGE DISCOVERY 2022; 12:e1451. [PMID: 35966173 PMCID: PMC9371441 DOI: 10.1002/widm.1451] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 12/22/2021] [Indexed: 06/15/2023]
Abstract
In recent years Artificial Intelligence in the form of machine learning has been revolutionizing biology, biomedical sciences, and gene-based agricultural technology capabilities. Massive data generated in biological sciences by rapid and deep gene sequencing and protein or other molecular structure determination, on the one hand, requires data analysis capabilities using machine learning that are distinctly different from classical statistical methods; on the other, these large datasets are enabling the adoption of novel data-intensive machine learning algorithms for the solution of biological problems that until recently had relied on mechanistic model-based approaches that are computationally expensive. This review provides a bird's eye view of the applications of machine learning in post-genomic biology. Attempt is also made to indicate as far as possible the areas of research that are poised to make further impacts in these areas, including the importance of explainable artificial intelligence (XAI) in human health. Further contributions of machine learning are expected to transform medicine, public health, agricultural technology, as well as to provide invaluable gene-based guidance for the management of complex environments in this age of global warming.
Collapse
Affiliation(s)
- Animesh Ray
- Riggs School of Applied Life Sciences, Keck Graduate Institute, 535 Watson Drive, Claremont, CA91711, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California, USA
| |
Collapse
|
28
|
Aihara S, Shibata R, Mizukami R, Sakai T, Shionoya A. Deep Learning-Based Myoelectric Potential Estimation Method for Wheelchair Operation. SENSORS 2022; 22:s22041615. [PMID: 35214514 PMCID: PMC8875647 DOI: 10.3390/s22041615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 02/12/2022] [Accepted: 02/15/2022] [Indexed: 02/01/2023]
Abstract
Wheelchair sports are recognized as an international sport, and research and support are being promoted to increase the competitiveness of wheelchair sports. For example, an electromyogram can observe muscle activity. However, it is generally used under controlled conditions due to the complexity of preparing the measurement equipment and the movement restrictions imposed by cables and measurement equipment. It is difficult to perform measurements in actual competition environments. Therefore, in this study, we developed a method to estimate myoelectric potential that can be used in competitive environments and does not limit physical movement. We developed a deep learning model that outputs surface myoelectric potentials by inputting camera images of wheelchair movements and the measured values of inertial sensors installed on wheelchairs. For seven subjects, we estimated the myoelectric potential during chair work, which is important in wheelchair sports. As a result of creating an in-subject model and comparing the estimated myoelectric potential with the myoelectric potential measured by an electromyogram, we confirmed a correlation (correlation coefficient 0.5 or greater at a significance level of 0.1%). Since this method can estimate the myoelectric potential without limiting the movement of the body, it is considered that it can be applied to the performance evaluation of wheelchair sports.
Collapse
Affiliation(s)
- Shimpei Aihara
- Department of Sport Science, Japan Institute of Sports Sciences, 3-15-1 Nishigaoka, Kita-ku, Tokyo 115-0056, Japan
- School of Creative Science and Engineering, Waseda University, Wasedamachi-27, Shinjuku-ku, Tokyo 169-8050, Japan
- Correspondence: (S.A.); (R.S.)
| | - Ryusei Shibata
- Graduate School of Information and Management Systems Engineering, Nagaoka University of Technology, 1603-1, Kamitomioka, Nagaoka, Niigata 940-2188, Japan; (R.M.); (T.S.); (A.S.)
- Correspondence: (S.A.); (R.S.)
| | - Ryosuke Mizukami
- Graduate School of Information and Management Systems Engineering, Nagaoka University of Technology, 1603-1, Kamitomioka, Nagaoka, Niigata 940-2188, Japan; (R.M.); (T.S.); (A.S.)
| | - Takara Sakai
- Graduate School of Information and Management Systems Engineering, Nagaoka University of Technology, 1603-1, Kamitomioka, Nagaoka, Niigata 940-2188, Japan; (R.M.); (T.S.); (A.S.)
| | - Akira Shionoya
- Graduate School of Information and Management Systems Engineering, Nagaoka University of Technology, 1603-1, Kamitomioka, Nagaoka, Niigata 940-2188, Japan; (R.M.); (T.S.); (A.S.)
| |
Collapse
|
29
|
Weng F, Zhang H, Yang C. Volatility forecasting of crude oil futures based on a genetic algorithm regularization online extreme learning machine with a forgetting factor: The role of news during the COVID-19 pandemic. RESOURCES POLICY 2021; 73:102148. [PMID: 34539033 PMCID: PMC8434824 DOI: 10.1016/j.resourpol.2021.102148] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Revised: 03/30/2021] [Accepted: 05/05/2021] [Indexed: 05/14/2023]
Abstract
The outbreak of news and opinions during the COVID-19 pandemic is unprecedented in this age of rapid dissemination of information. The ensuing uncertainty has led to the emergence of heightened volatility in prices of crude oil futures. Whether such news has predictive value for the volatility of crude oil futures during the COVID-19 pandemic is examined in this research. We proposed a modeling framework, genetic algorithm regularization online extreme learning machine with forgetting factor (GA-RFOS-ELM), to estimate the effects of news during the COVID-19 pandemic on the volatility of crude oil futures. GA-RFOS-ELM could learn block-by-block with fixed or varying block size when considering the block own valid period. The experimental results illustrate that news during the COVID-19 pandemic has more predictive information, which is crucial for short-term volatility forecasting of crude oil futures. The novel approach illustrates that online update learning ability is needed during the COVID-19 pandemic, which could be effective and efficient in volatility forecasting of crude oil futures. The contributions of our study are significant for investors and administrators to predict and understand the behavior of volatility during the COVID-19 pandemic.
Collapse
Affiliation(s)
- Futian Weng
- School of Medicine, Xiamen University, Xiamen, Fujian, 361005, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian, 361005, China
- Data Mining Research Center, Xiamen University, Xiamen, Fujian, 361005, China
| | - Hongwei Zhang
- School of Mathematics and Statistics, Central South University, Changsha, Hunan, 410083, China
- Institute of Metal Resources Strategy, Central South University, Changsha, 410083, China
| | - Cai Yang
- School of Business Administration, Hunan University, Changsha, Hunan, 410082, China
| |
Collapse
|
30
|
Mascarella MA, Muthukrishnan N, Maleki F, Kergoat MJ, Richardson K, Mlynarek A, Forest VI, Reinhold C, Martin DR, Hier M, Sadeghi N, Forghani R. Above and Beyond Age: Prediction of Major Postoperative Adverse Events in Head and Neck Surgery. Ann Otol Rhinol Laryngol 2021; 131:697-703. [PMID: 34416844 PMCID: PMC9203666 DOI: 10.1177/00034894211041222] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
OBJECTIVE Major postoperative adverse events (MPAEs) following head and neck surgery are not infrequent and lead to significant morbidity. The objective of this study was to ascertain which factors are most predictive of MPAEs in patients undergoing head and neck surgery. METHODS A cohort study was carried out based on data from patients registered in the National Surgical Quality Improvement Program (NSQIP) from 2006 to 2018. All patients undergoing non-ambulatory head and neck surgery based on Current Procedural Terminology codes were included. Perioperative factors were evaluated to predict MPAEs within 30-days of surgery. Age was classified as both a continuous and categorical variable. Retained factors were classified by attributable fraction and C-statistic. Multivariate regression and supervised machine learning models were used to quantify the contribution of age as a predictor of MPAEs. RESULTS A total of 43 701 operations were analyzed with 5106 (11.7%) MPAEs. The results of supervised machine learning indicated that prolonged surgeries, anemia, free tissue transfer, weight loss, wound classification, hypoalbuminemia, wound infection, tracheotomy (concurrent with index head and neck surgery), American Society of Anesthesia (ASA) class, and sex as most predictive of MPAEs. On multivariate regression, ASA class (21.3%), hypertension on medication (15.8%), prolonged operative time (15.3%), sex (13.1%), preoperative anemia (12.8%), and free tissue transfer (9%) had the largest attributable fractions associated with MPAEs. Age was independently associated with MPAEs with an attributable fraction ranging from 0.6% to 4.3% with poor predictive ability (C-statistic 0.60). CONCLUSION Surgical, comorbid, and frailty-related factors were most predictive of short-term MPAEs following head and neck surgery. Age alone contributed a small attributable fraction and poor prediction of MPAEs. LEVEL OF EVIDENCE 3.
Collapse
Affiliation(s)
- Marco A Mascarella
- Department of Otolaryngology-Head and Neck Surgery, McGill University, Montreal, QC, Canada.,Centre for Clinical Epidemiology, Lady Davis Institute of the Jewish General Hospital, Montreal, QC, Canada
| | - Nikesh Muthukrishnan
- Augmented Intelligence & Precision Health Laboratory (AIPHL) of the Department of Radiology and the Research Institute of McGill University Health Centre, Montreal, QC, Canada.,Department of Radiology, McGill University, Montreal, QC, Canada
| | - Farhad Maleki
- Augmented Intelligence & Precision Health Laboratory (AIPHL) of the Department of Radiology and the Research Institute of McGill University Health Centre, Montreal, QC, Canada.,Department of Radiology, McGill University, Montreal, QC, Canada
| | - Marie-Jeanne Kergoat
- Department of Geriatric Medicine, Geriatric Institute of Montreal, University of Montreal, Montreal, QC, Canada
| | - Keith Richardson
- Department of Otolaryngology-Head and Neck Surgery, McGill University, Montreal, QC, Canada
| | - Alex Mlynarek
- Department of Otolaryngology-Head and Neck Surgery, McGill University, Montreal, QC, Canada
| | | | - Caroline Reinhold
- Augmented Intelligence & Precision Health Laboratory (AIPHL) of the Department of Radiology and the Research Institute of McGill University Health Centre, Montreal, QC, Canada.,Department of Radiology, McGill University, Montreal, QC, Canada
| | - Diego R Martin
- Augmented Intelligence & Precision Health Laboratory (AIPHL) of the Department of Radiology and the Research Institute of McGill University Health Centre, Montreal, QC, Canada.,Department of Radiology, McGill University, Montreal, QC, Canada
| | - Michael Hier
- Department of Otolaryngology-Head and Neck Surgery, McGill University, Montreal, QC, Canada
| | - Nader Sadeghi
- Department of Otolaryngology-Head and Neck Surgery, McGill University, Montreal, QC, Canada.,Research Institute of the McGill University Health Centre, McGill University, Montreal, QC, Canada
| | - Reza Forghani
- Augmented Intelligence & Precision Health Laboratory (AIPHL) of the Department of Radiology and the Research Institute of McGill University Health Centre, Montreal, QC, Canada.,Department of Radiology, McGill University, Montreal, QC, Canada.,Segal Cancer Centre, Lady Davis Research Institute, Jewish General Hospital, Montreal, QC, Canada
| |
Collapse
|
31
|
Mangino AA, Smith KA, Finch WH, Hernández-Finch ME. Improving Predictive Classification Models Using Generative Adversarial Networks in the Prediction of Suicide Attempts. MEASUREMENT AND EVALUATION IN COUNSELING AND DEVELOPMENT 2021. [DOI: 10.1080/07481756.2021.1906156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
32
|
Song M, Jung H, Lee S, Kim D, Ahn M. Diagnostic Classification and Biomarker Identification of Alzheimer's Disease with Random Forest Algorithm. Brain Sci 2021; 11:453. [PMID: 33918453 PMCID: PMC8065661 DOI: 10.3390/brainsci11040453] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Revised: 03/29/2021] [Accepted: 03/31/2021] [Indexed: 11/29/2022] Open
Abstract
Random Forest (RF) is a bagging ensemble model and has many important advantages, such as robustness to noise, an effective structure for complex multimodal data and parallel computing, and also provides important features that help investigate biomarkers. Despite these benefits, RF is not used actively to predict Alzheimer's disease (AD) with brain MRIs. Recent studies have reported RF's effectiveness in predicting AD, but the test sample sizes were too small to draw any solid conclusions. Thus, it is timely to compare RF with other learning model methods, including deep learning, particularly with large amounts of data. In this study, we tested RF and various machine learning models with regional volumes from 2250 brain MRIs: 687 normal controls (NC), 1094 mild cognitive impairment (MCI), and 469 AD that ADNI (Alzheimer's Disease Neuroimaging Initiative database) provided. Three types of features sets (63, 29, and 22 features) were selected, and classification accuracies were computed with RF, Support vector machine (SVM), Multi-layer perceptron (MLP), and Convolutional neural network (CNN). As a result, RF, MLP, and CNN showed high performances of 90.2%, 89.6%, and 90.5% with 63 features. Interestingly, when 22 features were used, RF showed the smallest decrease in accuracy, -3.8%, and the standard deviation did not change significantly, while MLP and CNN yielded decreases in accuracy of -6.8% and -4.5% with changes in the standard deviation from 3.3% to 4.0% for MLP and 2.1% to 7.0% for CNN, indicating that RF predicts AD more reliably with fewer features. In addition, we investigated the importance of the features that RF provides, and identified the hippocampus, amygdala, and inferior lateral ventricle as the major contributors in classifying NC, MCI, and AD. On average, AD showed smaller hippocampus and amygdala volumes and a larger volume of inferior lateral ventricle than those of MCI and NC.
Collapse
Affiliation(s)
- Minseok Song
- School of Computer Science and Electrical Engineering, Handong Global University, Pohang-si 37554, Korea; (M.S.); (H.J.); (S.L.)
| | - Hyeyoom Jung
- School of Computer Science and Electrical Engineering, Handong Global University, Pohang-si 37554, Korea; (M.S.); (H.J.); (S.L.)
| | - Seungyong Lee
- School of Computer Science and Electrical Engineering, Handong Global University, Pohang-si 37554, Korea; (M.S.); (H.J.); (S.L.)
| | | | - Minkyu Ahn
- School of Computer Science and Electrical Engineering, Handong Global University, Pohang-si 37554, Korea; (M.S.); (H.J.); (S.L.)
| |
Collapse
|
33
|
Meshram SG, Safari MJS, Khosravi K, Meshram C. Iterative classifier optimizer-based pace regression and random forest hybrid models for suspended sediment load prediction. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2021; 28:11637-11649. [PMID: 33125681 DOI: 10.1007/s11356-020-11335-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Accepted: 10/20/2020] [Indexed: 06/11/2023]
Abstract
Suspended sediment load is a substantial portion of the total sediment load in rivers and plays a vital role in determination of the service life of the downstream dam. To this end, estimation models are needed to compute suspended sediment load in rivers. The application of artificial intelligence (AI) techniques has become popular in water resources engineering for solving complex problems such as sediment transport modeling. In this study, novel integrative intelligence models coupled with iterative classifier optimizer (ICO) are proposed to compute suspended sediment load in Simga station in Seonath river basin, Chhattisgarh State, India. The proposed models are hybridization of the random forest (RF) and pace regression (PR) models with the iterative classifier optimizer (ICO) algorithm to develop ICO-RF and ICO-PR hybrid models. The recommended models are established using the discharge and sediment daily data spanning a 35-year period (1980-2015). The accuracy of the developed models is examined in terms of error; by root mean square error (RMSE) and mean absolute error (MAE); and based on a correlation index of determination coefficient (R2). The proposed novel hybrid models of ICO-RF and ICO-PR have been found to be more precise than their stand-alone counterparts of RF and PR. Overall, ICO-RF models delivered better accuracy than their alternatives. The results of this analysis tend to claim the appropriateness of the implemented methodology for precise modeling of the suspended sediment load in rivers.
Collapse
Affiliation(s)
- Sarita Gajbhiye Meshram
- Department for Management of Science and Technology Development, Ton Duc Thang University, Ho Chi Minh City, Vietnam.
- Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Vietnam.
| | | | - Khabat Khosravi
- Department of Watershed Management Engineering, Sari Agricultural Science and Natural Resources University, Sari, Iran
| | - Chandrashekhar Meshram
- Department of Post Graduate Studies and Research in Mathematics, Jayawanti Haksar Government Post Graduation College, College of Chhindwara University, Chhindwara, Betul, India
| |
Collapse
|
34
|
Visual light perceptions caused by medical linear accelerator: Findings of machine-learning algorithms in a prospective questionnaire-based case-control study. PLoS One 2021; 16:e0247597. [PMID: 33630912 PMCID: PMC7906346 DOI: 10.1371/journal.pone.0247597] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Accepted: 02/09/2021] [Indexed: 12/03/2022] Open
Abstract
This study aimed to investigate the possible incidence of visual light perceptions (VLPs) during radiation therapy (RT). We analyzed whether VLPs could be affected by differences in the radiation energy, prescription doses, age, sex, or RT locations, and whether all VLPs were caused by radiation. From November 2016 to August 2018, a total of 101 patients who underwent head-and-neck or brain RT were screened. After receiving RT, questionnaires were completed, and the subjects were interviewed. Random forests (RF), a tree-based machine learning algorithm, and logistic regression (LR) analyses were compared by the area under the curve (AUC), and the algorithm that achieved the highest AUC was selected. The dataset sample was based on treatment with non-human units, and a total of 293 treatment fields from 78 patients were analyzed. VLPs were detected only in 122 of the 293 exposure portals (40.16%). The dataset was randomly divided into 80% and 20% as the training set and test set, respectively. In the test set, RF achieved an AUC of 0.888, whereas LR achieved an AUC of 0.773. In this study, the retina fraction dose was the most important continuous variable and had a positive effect on VLP. Age was the most important categorical variable. In conclusion, the visual light perception phenomenon by the human body during RT is induced by radiation rather than being a self-suggested hallucination or induced by phosphenes.
Collapse
|
35
|
Nie X, Cai Y, Liu J, Liu X, Zhao J, Yang Z, Wen M, Liu L. Mortality Prediction in Cerebral Hemorrhage Patients Using Machine Learning Algorithms in Intensive Care Units. Front Neurol 2021; 11:610531. [PMID: 33551969 PMCID: PMC7855582 DOI: 10.3389/fneur.2020.610531] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2020] [Accepted: 12/11/2020] [Indexed: 12/12/2022] Open
Abstract
Objectives: This study aims to investigate whether the machine learning algorithms could provide an optimal early mortality prediction method compared with other scoring systems for patients with cerebral hemorrhage in intensive care units in clinical practice. Methods: Between 2008 and 2012, from Intensive Care III (MIMIC-III) database, all cerebral hemorrhage patients monitored with the MetaVision system and admitted to intensive care units were enrolled in this study. The calibration, discrimination, and risk classification of predicted hospital mortality based on machine learning algorithms were assessed. The primary outcome was hospital mortality. Model performance was assessed with accuracy and receiver operating characteristic curve analysis. Results: Of 760 cerebral hemorrhage patients enrolled from MIMIC database [mean age, 68.2 years (SD, ±15.5)], 383 (50.4%) patients died in hospital, and 377 (49.6%) patients survived. The area under the receiver operating characteristic curve (AUC) of six machine learning algorithms was 0.600 (nearest neighbors), 0.617 (decision tree), 0.655 (neural net), 0.671(AdaBoost), 0.819 (random forest), and 0.725 (gcForest). The AUC was 0.423 for Acute Physiology and Chronic Health Evaluation II score. The random forest had the highest specificity and accuracy, as well as the greatest AUC, showing the best ability to predict in-hospital mortality. Conclusions: Compared with conventional scoring system and the other five machine learning algorithms in this study, random forest algorithm had better performance in predicting in-hospital mortality for cerebral hemorrhage patients in intensive care units, and thus further research should be conducted on random forest algorithm.
Collapse
Affiliation(s)
- Ximing Nie
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.,China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Yuan Cai
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.,China National Clinical Research Center for Neurological Diseases, Beijing, China.,Department of Medicine and Therapeutics, Prince of Wales Hospital, Chinese University of Hong Kong, Hong Kong, China
| | - Jingyi Liu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.,China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Xiran Liu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.,China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Jiahui Zhao
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.,China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Zhonghua Yang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.,China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Miao Wen
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.,China National Clinical Research Center for Neurological Diseases, Beijing, China
| | - Liping Liu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.,China National Clinical Research Center for Neurological Diseases, Beijing, China
| |
Collapse
|
36
|
Lupton-Smith C, Stuart EA, McGinty EE, Dalcin AT, Jerome GJ, Wang NY, Daumit GL. Determining Predictors of Weight Loss in a Behavioral Intervention: A Case Study in the Use of Lasso Regression. Front Psychiatry 2021; 12:707707. [PMID: 35185628 PMCID: PMC8850776 DOI: 10.3389/fpsyt.2021.707707] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 12/29/2021] [Indexed: 01/26/2023] Open
Abstract
OBJECTIVE This study investigates predictors of weight loss among individuals with serious mental illness participating in an 18-month behavioral weight loss intervention, using Lasso regression to select the most powerful predictors. METHODS Data were analyzed from the intervention group of the ACHIEVE trial, an 18-month behavioral weight loss intervention in adults with serious mental illness. Lasso regression was employed to identify predictors of at least five-pound weight loss across the intervention time span. Once predictors were identified, classification trees were created to show examples of how to classify participants into having likely outcomes based on characteristics at baseline and during the intervention. RESULTS The analyzed sample contained 137 participants. Seventy-one (51.8%) individuals had a net weight loss of at least five pounds from baseline to 18 months. The Lasso regression selected weight loss from baseline to 6 months as a primary predictor of at least five pound 18-month weight loss, with a standardized coefficient of 0.51 (95% CI: -0.37, 1.40). Three other variables were also selected in the regression but added minimal predictive ability. CONCLUSIONS The analyses in this paper demonstrate the importance of tracking weight loss incrementally during an intervention as an indicator for overall weight loss, as well as the challenges in predicting long-term weight loss with other variables commonly available in clinical trials. The methods used in this paper also exemplify how to effectively analyze a clinical trial dataset containing many variables and identify factors related to desired outcomes.
Collapse
Affiliation(s)
- Carly Lupton-Smith
- Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Elizabeth A Stuart
- Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Emma E McGinty
- Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Arlene T Dalcin
- Johns Hopkins School of Medicine, Baltimore, MD, United States
| | - Gerald J Jerome
- Department of Kinesiology, Towson University, Towson, MD, United States
| | - Nae-Yuh Wang
- Johns Hopkins School of Medicine, Baltimore, MD, United States
| | - Gail L Daumit
- Johns Hopkins School of Medicine, Baltimore, MD, United States
| |
Collapse
|
37
|
Kuzmin K, Adeniyi AE, DaSouza AK, Lim D, Nguyen H, Molina NR, Xiong L, Weber IT, Harrison RW. Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone. Biochem Biophys Res Commun 2020; 533:553-558. [PMID: 32981683 PMCID: PMC7500881 DOI: 10.1016/j.bbrc.2020.09.010] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Accepted: 09/06/2020] [Indexed: 11/19/2022]
Abstract
Coronaviruses infect many animals, including humans, due to interspecies transmission. Three of the known human coronaviruses: MERS, SARS-CoV-1, and SARS-CoV-2, the pathogen for the COVID-19 pandemic, cause severe disease. Improved methods to predict host specificity of coronaviruses will be valuable for identifying and controlling future outbreaks. The coronavirus S protein plays a key role in host specificity by attaching the virus to receptors on the cell membrane. We analyzed 1238 spike sequences for their host specificity. Spike sequences readily segregate in t-SNE embeddings into clusters of similar hosts and/or virus species. Machine learning with SVM, Logistic Regression, Decision Tree, Random Forest gave high average accuracies, F1 scores, sensitivities and specificities of 0.95-0.99. Importantly, sites identified by Decision Tree correspond to protein regions with known biological importance. These results demonstrate that spike sequences alone can be used to predict host specificity.
Collapse
Affiliation(s)
- Kiril Kuzmin
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA.
| | | | - Arthur Kevin DaSouza
- Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA
| | - Deuk Lim
- Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA
| | - Huyen Nguyen
- Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA
| | - Nuria Ramirez Molina
- Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA
| | - Lanqiao Xiong
- Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA
| | - Irene T Weber
- Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA
| | - Robert W Harrison
- Department of Computer Science, Georgia State University, 1 Park Place, Atlanta, GA, 30303, USA; Department of Biology, Georgia State University, 145 Piedmont Ave SE, Atlanta, GA, 30303, USA.
| |
Collapse
|
38
|
Farhadian M, Torkaman S, Mojarad F. Random forest algorithm to identify factors associated with sports-related dental injuries in 6 to 13-year-old athlete children in Hamadan, Iran-2018 -a cross-sectional study. BMC Sports Sci Med Rehabil 2020; 12:69. [PMID: 33292522 PMCID: PMC7659093 DOI: 10.1186/s13102-020-00217-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2020] [Accepted: 10/29/2020] [Indexed: 11/15/2022]
Abstract
Background Traumatic dental injuries are one of the most important problems with major physical, aesthetic, psychological, social, functional and therapeutic problems that adversely affect the quality of life of children and adolescents. Recently the development of methods based on machine learning algorithms has provided researchers with more powerful tools to more accurate prediction in different domains and evaluate the factors affecting different phenomena more reliably than traditional regression models. This study tries to investigate the performance of random forest (RF) in identifying factors associated with sports-related dental injuries. Also, the accuracy of the RF model for predicting sports-related dental injuries was compared with logistic regression model as traditional competitor. Methods This cross-sectional study was applied to 356 athlete children aged 6 to 13-year-old in Hamadan, Iran. Random forest and logistic regression constructed by using sports-related dental injuries as response variables and age, sex, parent’s education, child’s birth order, type of sports activity, duration of sports activity, awareness regarding the mouthguard, mouthguard use as input. A self-reported questionnaire was used to obtain information. Results Fifty-five (15.4%) subjects had experienced a sports-related dental injury. The mean age of children with sports injuries was significantly higher than children without the experience of injury (p = 0.006). The prevalence of injury was significantly higher in boys (p = 0.008). Children with illiterate mothers are more likely to be injured than children with educated mothers (p = 0.045). Awareness of mouthguard and its use during exercise has a significant effect on reducing the prevalence of injury among users (p < 0.001). Random forest model has a higher prediction accuracy (89.3%) for predicting sports-related dental injuries compared to the logistic regression (84.2%). The results of the relative importance of variables, based on RF showed, mouthguard use, and mouthguard awareness has more contributed importance in dental sport-related injuries’ prediction. Subsequently, the importance of sex and age is in the next position. Conclusions Using predictive models such as RF challenges existing inaccurate predictions due to high complexity and interactions between variables would be minimized. This helps to achieve more accurate identification of factors in sport-related dental injury among the general population of children.
Collapse
Affiliation(s)
- Maryam Farhadian
- Department of Biostatistics, School of Public Health and Research Center for Health Sciences, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Sima Torkaman
- Pediatric Dentistry Department, Dentistry School, Hamadan University of Medical Sciences, Hamadan, Iran
| | - Farzad Mojarad
- Pediatric Dentistry Department, Dentistry School, Hamadan University of Medical Sciences, P.O. Box 4171-65175, Hamadan, Iran.
| |
Collapse
|
39
|
Petrea ȘM, Costache M, Cristea D, Strungaru ȘA, Simionov IA, Mogodan A, Oprica L, Cristea V. A Machine Learning Approach in Analyzing Bioaccumulation of Heavy Metals in Turbot Tissues. Molecules 2020; 25:E4696. [PMID: 33066472 PMCID: PMC7587397 DOI: 10.3390/molecules25204696] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2020] [Revised: 10/07/2020] [Accepted: 10/09/2020] [Indexed: 12/02/2022] Open
Abstract
Metals are considered to be one of the most hazardous substances due to their potential for accumulation, magnification, persistence, and wide distribution in water, sediments, and aquatic organisms. Demersal fish species, such as turbot (Psetta maxima maeotica), are accepted by the scientific communities as suitable bioindicators of heavy metal pollution in the aquatic environment. The present study uses a machine learning approach, which is based on multiple linear and non-linear models, in order to effectively estimate the concentrations of heavy metals in both turbot muscle and liver tissues. For multiple linear regression (MLR) models, the stepwise method was used, while non-linear models were developed by applying random forest (RF) algorithm. The models were based on data that were provided from scientific literature, attributed to 11 heavy metals (As, Ca, Cd, Cu, Fe, K, Mg, Mn, Na, Ni, Zn) from both muscle and liver tissues of turbot exemplars. Significant MLR models were recorded for Ca, Fe, Mg, and Na in muscle tissue and K, Cu, Zn, and Na in turbot liver tissue. The non-linear tree-based RF prediction models (over 70% prediction accuracy) were identified for As, Cd, Cu, K, Mg, and Zn in muscle tissue and As, Ca, Cd, Mg, and Fe in turbot liver tissue. Both machine learning MLR and non-linear tree-based RF prediction models were identified to be suitable for predicting the heavy metal concentration from both turbot muscle and liver tissues. The models can be used for improving the knowledge and economic efficiency of linked heavy metals food safety and environment pollution studies.
Collapse
Affiliation(s)
- Ștefan-Mihai Petrea
- Department of Foood Science, Food Engineering, Biotechnology and Aquaculture, Faculty of Food Science and Engineering, University “Dunărea de Jos” of Galați, 800008 Galați, Romania; (I.-A.S.); (A.M.); (V.C.)
| | - Mioara Costache
- The Fish Culture Research and Development Station of Nucet, 137335 Dâmbovița-Nucet, Romania
| | - Dragoș Cristea
- Faculty of Economics and Business, University “Dunărea de Jos” of Galați, 800008 Galați, Romania;
| | - Ștefan-Adrian Strungaru
- Institute for Interdisciplinary Research, Science Research Department, “Alexandru Ioan Cuza” University of Iasi, Lascar Catargi Str. 54, 700107 Iasi, Romania;
| | - Ira-Adeline Simionov
- Department of Foood Science, Food Engineering, Biotechnology and Aquaculture, Faculty of Food Science and Engineering, University “Dunărea de Jos” of Galați, 800008 Galați, Romania; (I.-A.S.); (A.M.); (V.C.)
- Multidisciplinary Research Platform (ReForm), University “Dunărea de Jos” of Galați, 800008 Galați, Romania
| | - Alina Mogodan
- Department of Foood Science, Food Engineering, Biotechnology and Aquaculture, Faculty of Food Science and Engineering, University “Dunărea de Jos” of Galați, 800008 Galați, Romania; (I.-A.S.); (A.M.); (V.C.)
| | - Lacramioara Oprica
- Department of Biology, Faculty of Biology, Alexandru Ioan Cuza University, 700506 Iasi, Romania;
| | - Victor Cristea
- Department of Foood Science, Food Engineering, Biotechnology and Aquaculture, Faculty of Food Science and Engineering, University “Dunărea de Jos” of Galați, 800008 Galați, Romania; (I.-A.S.); (A.M.); (V.C.)
- Multidisciplinary Research Platform (ReForm), University “Dunărea de Jos” of Galați, 800008 Galați, Romania
| |
Collapse
|
40
|
Waked JP, Canuto MPLDAM, Gueiros MCSN, Aroucha JMCNL, Farias CG, Caldas ADF. Model for Predicting Temporomandibular Dysfunction: Use of Classification Tree Analysis. Braz Dent J 2020; 31:360-367. [PMID: 32901710 DOI: 10.1590/0103-6440202003279] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Accepted: 02/17/2020] [Indexed: 11/21/2022] Open
Abstract
The aim of this study was to construct a predictive model that uses classification tree statistical analysis to predict the occurrence of temporomandibular disorder, by dividing the sample into groups of high and low risk for the development of the disease. The use of predictive statistical approaches that facilitate the process of recognizing and/or predicting the occurrence of temporomandibular disorder is of interest to the scientific community, for the purpose of providing patients with more adequate solutions in each case. This was a cross-sectional analytical population-based study that involved a sample of 776 individuals who had sought medical or dental attendance at the Family Health Units in Recife, PE, Brazil. The sample was submitted to anamnesis using the instrument Research Diagnostic Criteria for Temporomandibular Disorders. The data were inserted into the software Statistical Package for the Social Sciences 20.0 and analyzed by the Pearson Chi-square test for bivariate analysis, and by the classification tree method for the multivariate analysis. Temporomandibular disorder could be predicted by orofacial pain, age and depression. The high-risk group was composed of individuals with orofacial pain, those between the ages of 25 and 59 years and those who presented depression. The low risk group was composed of individuals without orofacial pain. The authors were able to conclude that the best predictor for temporomandibular disorder was orofacial pain, and that the predictive model proposed by the classification tree could be applied as a tool for simplifying decision making relative to the occurrence of temporomandibular disorder.
Collapse
Affiliation(s)
- Jorge P Waked
- Center for Rural Health and Technology, Academic Unit of Biological Sciences, UFCG - Universidade Federal de Campina Grande, Patos, PB, Brazil
| | - Mariana P L de A M Canuto
- Health Science Center, Department of Clinical and Preventive Dentistry, UFPE - Universidade Federal de Pernambuco, Recife, PE, Brazil
| | - Maria Cecilia S N Gueiros
- Health Science Center, Department of Clinical and Preventive Dentistry, UFPE - Universidade Federal de Pernambuco, Recife, PE, Brazil
| | - João Marcílio C N L Aroucha
- Health Science Center, Department of Clinical and Preventive Dentistry, UFPE - Universidade Federal de Pernambuco, Recife, PE, Brazil
| | - Cleysiane G Farias
- Health Science Center, Department of Clinical and Preventive Dentistry, UFPE - Universidade Federal de Pernambuco, Recife, PE, Brazil
| | - Arnaldo de F Caldas
- Health Science Center, Department of Clinical and Preventive Dentistry, UFPE - Universidade Federal de Pernambuco, Recife, PE, Brazil
| |
Collapse
|
41
|
DePaoli D, Lemoine É, Ember K, Parent M, Prud’homme M, Cantin L, Petrecca K, Leblond F, Côté DC. Rise of Raman spectroscopy in neurosurgery: a review. JOURNAL OF BIOMEDICAL OPTICS 2020; 25:1-36. [PMID: 32358930 PMCID: PMC7195442 DOI: 10.1117/1.jbo.25.5.050901] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 04/10/2020] [Indexed: 05/21/2023]
Abstract
SIGNIFICANCE Although the clinical potential for Raman spectroscopy (RS) has been anticipated for decades, it has only recently been used in neurosurgery. Still, few devices have succeeded in making their way into the operating room. With recent technological advancements, however, vibrational sensing is poised to be a revolutionary tool for neurosurgeons. AIM We give a summary of neurosurgical workflows and key translational milestones of RS in clinical use and provide the optics and data science background required to implement such devices. APPROACH We performed an extensive review of the literature, with a specific emphasis on research that aims to build Raman systems suited for a neurosurgical setting. RESULTS The main translatable interest in Raman sensing rests in its capacity to yield label-free molecular information from tissue intraoperatively. Systems that have proven usable in the clinical setting are ergonomic, have a short integration time, and can acquire high-quality signal even in suboptimal conditions. Moreover, because of the complex microenvironment of brain tissue, data analysis is now recognized as a critical step in achieving high performance Raman-based sensing. CONCLUSIONS The next generation of Raman-based devices are making their way into operating rooms and their clinical translation requires close collaboration between physicians, engineers, and data scientists.
Collapse
Affiliation(s)
- Damon DePaoli
- Université Laval, CERVO Brain Research Center, Québec, Canada
- Université Laval, Centre d’optique, Photonique et Lasers, Québec, Canada
| | - Émile Lemoine
- Polytechnique Montréal, Department of Engineering Physics, Montréal, Canada
- Centre de Recherche du Centre Hospitalier de l’Université de Montréal, Montréal, Canada
| | - Katherine Ember
- Polytechnique Montréal, Department of Engineering Physics, Montréal, Canada
- Centre de Recherche du Centre Hospitalier de l’Université de Montréal, Montréal, Canada
| | - Martin Parent
- Université Laval, CERVO Brain Research Center, Québec, Canada
| | - Michel Prud’homme
- Hôpital de l’Enfant-Jésus, Department of Neurosurgery, Québec, Canada
| | - Léo Cantin
- Hôpital de l’Enfant-Jésus, Department of Neurosurgery, Québec, Canada
| | - Kevin Petrecca
- McGill University, Montreal Neurological Institute-Hospital, Department of Neurology and Neurosurgery, Montreal, Canada
| | - Frédéric Leblond
- Polytechnique Montréal, Department of Engineering Physics, Montréal, Canada
- Centre de Recherche du Centre Hospitalier de l’Université de Montréal, Montréal, Canada
| | - Daniel C. Côté
- Université Laval, CERVO Brain Research Center, Québec, Canada
- Université Laval, Centre d’optique, Photonique et Lasers, Québec, Canada
| |
Collapse
|
42
|
Liu B. BioSeq-Analysis: a platform for DNA, RNA and protein sequence analysis based on machine learning approaches. Brief Bioinform 2020; 20:1280-1294. [PMID: 29272359 DOI: 10.1093/bib/bbx165] [Citation(s) in RCA: 194] [Impact Index Per Article: 38.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 11/08/2017] [Indexed: 01/07/2023] Open
Abstract
With the avalanche of biological sequences generated in the post-genomic age, one of the most challenging problems is how to computationally analyze their structures and functions. Machine learning techniques are playing key roles in this field. Typically, predictors based on machine learning techniques contain three main steps: feature extraction, predictor construction and performance evaluation. Although several Web servers and stand-alone tools have been developed to facilitate the biological sequence analysis, they only focus on individual step. In this regard, in this study a powerful Web server called BioSeq-Analysis (http://bioinformatics.hitsz.edu.cn/BioSeq-Analysis/) has been proposed to automatically complete the three main steps for constructing a predictor. The user only needs to upload the benchmark data set. BioSeq-Analysis can generate the optimized predictor based on the benchmark data set, and the performance measures can be reported as well. Furthermore, to maximize user's convenience, its stand-alone program was also released, which can be downloaded from http://bioinformatics.hitsz.edu.cn/BioSeq-Analysis/download/, and can be directly run on Windows, Linux and UNIX. Applied to three sequence analysis tasks, experimental results showed that the predictors generated by BioSeq-Analysis even outperformed some state-of-the-art methods. It is anticipated that BioSeq-Analysis will become a useful tool for biological sequence analysis.
Collapse
|
43
|
Lee S, Choe EK, Kang HY, Yoon JW, Kim HS. The exploration of feature extraction and machine learning for predicting bone density from simple spine X-ray images in a Korean population. Skeletal Radiol 2020; 49:613-618. [PMID: 31760458 DOI: 10.1007/s00256-019-03342-6] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 11/05/2019] [Accepted: 11/07/2019] [Indexed: 02/02/2023]
Abstract
OBJECTIVE Osteoporosis is hard to detect before it manifests symptoms and complications. In this study, we evaluated machine learning models for identifying individuals with abnormal bone mineral density (BMD) through an analysis of spine X-ray features extracted by deep learning to alert high-risk osteoporosis populations. MATERIALS AND METHODS We retrospectively used data obtained from health check-ups including spine X-ray and dual-energy X-ray absorptiometry (DXA). Consecutively, we selected people with normal and abnormal bone mineral density. From the regions of interest of X-ray images, deep convolutional networks were used to generate image features. We designed prediction models for abnormal BMD using the image features trained by machine learning classification algorithms. The performances of each model were evaluated. RESULTS From 334 participants, 170 images of abnormal (T scores < - 1.0 standard deviations (SD)) and 164 of normal BMD (T scores > = - 1.0 SD) were used for analysis. We found that a combination of feature extraction by VGGnet and classification by random forest based on the maximum balanced classification rate (BCR) yielded the best performance in terms of the area under the curve (AUC) (0.74), accuracy (0.71), sensitivity (0.81), specificity (0.60), BCR (0.70), and F1-score (0.73). CONCLUSION In this study, we explored various machine learning algorithms for the prediction of BMD using simple spine X-ray image features extracted by three deep learning algorithms. We identified the combination for the best performance in predicting high-risk populations with abnormal BMD.
Collapse
Affiliation(s)
| | - Eun Kyung Choe
- Department of Surgery, Seoul National University Hospital Healthcare System Gangnam Center, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul, 135-984, South Korea.
- Department of Surgery, Seoul National University College of Medicine, Seoul, South Korea.
| | - Hae Yeon Kang
- Department of Internal Medicine, Seoul National University Hospital Healthcare System Gangnam Center, Seoul, South Korea
| | - Ji Won Yoon
- Department of Internal Medicine, Seoul National University Hospital Healthcare System Gangnam Center, Seoul, South Korea
| | - Hua Sun Kim
- Department of Radiology, Seoul National University Hospital Healthcare System Gangnam Center, Seoul, South Korea
| |
Collapse
|
44
|
Díez-Sanmartín C, Sarasa Cabezuelo A. Application of Artificial Intelligence Techniques to Predict Survival in Kidney Transplantation: A Review. J Clin Med 2020; 9:572. [PMID: 32093027 PMCID: PMC7074285 DOI: 10.3390/jcm9020572] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 02/17/2020] [Accepted: 02/18/2020] [Indexed: 12/20/2022] Open
Abstract
A key issue in the field of kidney transplants is the analysis of transplant recipients' survival. By means of the information obtained from transplant patients, it is possible to analyse in which cases a transplant has a higher likelihood of success and the factors on which it will depend. In general, these analyses have been conducted by applying traditional statistical techniques, as the amount and variety of data available about kidney transplant processes were limited. However, two main changes have taken place in this field in the last decade. Firstly, the digitalisation of medical information through the use of electronic health records (EHRs), which store patients' medical histories electronically. This facilitates automatic information processing through specialised software. Secondly, medical Big Data has provided access to vast amounts of data on medical processes. The information currently available on kidney transplants is huge and varied by comparison to that initially available for this kind of study. This new context has led to the use of other non-traditional techniques more suitable to conduct survival analyses in these new conditions. Specifically, this paper provides a review of the main machine learning methods and tools that are being used to conduct kidney transplant patient and graft survival analyses.
Collapse
Affiliation(s)
| | - Antonio Sarasa Cabezuelo
- Department of Computer Systems and Computing, School of Computer Science, Complutense University of Madrid, 28040 Madrid, Spain;
| |
Collapse
|
45
|
JAVADI A, KHAMESIPOUR A, MONAJEMI F, GHAZISAEEDI M. Computational Modeling and Analysis to Predict Intracellular Parasite Epitope Characteristics Using Random Forest Technique. IRANIAN JOURNAL OF PUBLIC HEALTH 2020; 49:125-133. [PMID: 32309231 PMCID: PMC7152625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2018] [Accepted: 10/12/2018] [Indexed: 11/23/2022]
Abstract
BACKGROUND In a new approach, computational methods are used to design and evaluate the vaccine. The aim of the current study was to develop a computational tool to predict epitope candidate vaccines to be tested in experimental models. METHODS This study was conducted in the School of Allied Medical Sciences, and Center for Research and Training in Skin Diseases and Leprosy, Tehran University of Medical Sciences, Tehran, Iran in 2018. The random forest which is a classifier method was used to design computer-based tool to predict immunogenic peptides. Data was used to check the collected information from the IEDB, UniProt, and AAindex database. Overall, 1,264 collected data were used and divided into three parts; 70% of the data was used to train, 15% to validate and 15% to test the model. Five-fold cross-validation was used to find optimal hyper parameters of the model. Common performance metrics were used to evaluate the developed model. RESULTS Twenty seven features were identified as more important using RF predictor model and were used to predict the class of peptides. The RF model improves the performance of predictor model in comparison with the other predictor models (AUC±SE: 0.925±0.029). Using the developed RF model helps to identify the most likely epitopes for further experimental studies. CONCLUSION The current developed random forest model is able to more accurately predict the immunogenic peptides of intracellular parasites.
Collapse
Affiliation(s)
- Amir JAVADI
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
- Department of Medical Social Sciences, Faculty of Medicine, Qazvin University of Medical Sciences, Qazvin, Iran
| | - Ali KHAMESIPOUR
- Center for Research and Training in Skin Diseases and Leprosy, Tehran University of Medical Sciences, Tehran, Iran
| | | | - Marjan GHAZISAEEDI
- Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
46
|
Huang K, Ji F, Xie Z, Wu D, Xu X, Gao H, Ouyang X, Xiao L, Zhou M, Zhu D, Li L. Artificial liver support system therapy in acute-on-chronic hepatitis B liver failure: Classification and regression tree analysis. Sci Rep 2019; 9:16462. [PMID: 31712684 PMCID: PMC6848208 DOI: 10.1038/s41598-019-53029-0] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Accepted: 10/28/2019] [Indexed: 02/08/2023] Open
Abstract
Artificial liver support systems (ALSS) are widely used to treat patients with hepatitis B virus-related acute-on-chronic liver failure (HBV-ACLF). The aims of the present study were to investigate the subgroups of patients with HBV-ACLF who may benefit from ALSS therapy, and the relevant patient-specific factors. 489 ALSS-treated HBV-ACLF patients were enrolled, and served as derivation and validation cohorts for classification and regression tree (CART) analysis. CART analysis identified three factors prognostic of survival: hepatic encephalopathy (HE), prothrombin time (PT), and total bilirubin (TBil) level; and two distinct risk groups: low (28-day mortality 10.2-39.5%) and high risk (63.8-91.1%). The CART model showed that patients lacking HE and with a PT ≤ 27.8 s and a TBil level ≤455 μmol/L experienced less 28-day mortality after ALSS therapy. For HBV-ACLF patients with HE and a PT > 27.8 s, mortality remained high after such therapy. Patients lacking HE with a PT ≤ 27.8 s and TBil level ≤ 455 μmol/L may benefit markedly from ALSS therapy. For HBV-ACLF patients at high risk, unnecessary ALSS therapy should be avoided. The CART model is a novel user-friendly tool for screening HBV-ACLF patient eligibility for ALSS therapy, and will aid clinicians via ACLF risk stratification and therapeutic guidance.
Collapse
Affiliation(s)
- Kaizhou Huang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital of Zhejiang University, College of Medicine, Zhejiang University, Hangzhou, Zhejiang Province, China
| | - Feiyang Ji
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital of Zhejiang University, College of Medicine, Zhejiang University, Hangzhou, Zhejiang Province, China
| | - Zhongyang Xie
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital of Zhejiang University, College of Medicine, Zhejiang University, Hangzhou, Zhejiang Province, China
| | - Daxian Wu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital of Zhejiang University, College of Medicine, Zhejiang University, Hangzhou, Zhejiang Province, China
| | - Xiaowei Xu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital of Zhejiang University, College of Medicine, Zhejiang University, Hangzhou, Zhejiang Province, China
| | - Hainv Gao
- Shulan Hangzhou Hospital, Shulan Health, Hangzhou, Zhejiang Province, China
| | - Xiaoxi Ouyang
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital of Zhejiang University, College of Medicine, Zhejiang University, Hangzhou, Zhejiang Province, China
| | - Lanlan Xiao
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital of Zhejiang University, College of Medicine, Zhejiang University, Hangzhou, Zhejiang Province, China
| | - Menghao Zhou
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital of Zhejiang University, College of Medicine, Zhejiang University, Hangzhou, Zhejiang Province, China
| | - Danhua Zhu
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital of Zhejiang University, College of Medicine, Zhejiang University, Hangzhou, Zhejiang Province, China
| | - Lanjuan Li
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital of Zhejiang University, College of Medicine, Zhejiang University, Hangzhou, Zhejiang Province, China.
| |
Collapse
|
47
|
Morabito MJ, Usta M, Cheng X, Zhang XF, Oztekin A, Webb EB. Prediction of Sub-Monomer A2 Domain Dynamics of the von Willebrand Factor by Machine Learning Algorithm and Coarse-Grained Molecular Dynamics Simulation. Sci Rep 2019; 9:9037. [PMID: 31227726 PMCID: PMC6588549 DOI: 10.1038/s41598-019-44044-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Accepted: 03/15/2019] [Indexed: 11/09/2022] Open
Abstract
We develop a machine learning tool useful for predicting the instantaneous dynamical state of sub-monomer features within long linear polymer chains, as well as extracting the dominant macromolecular motions associated with sub-monomer behaviors of interest. We employ the tool to better understand and predict sub-monomer A2 domain unfolding dynamics occurring amidst the dominant large-scale macromolecular motions of the biopolymer von Willebrand Factor (vWF) immersed in flow. Results of coarse-grained Molecular Dynamics (MD) simulations of non-grafted vWF multimers subject to a shearing flow were used as input variables to a Random Forest Algorithm (RFA). Twenty unique features characterizing macromolecular conformation information of vWF multimers were used for training the RFA. The corresponding responses classify instantaneous A2 domain state as either folded or unfolded, and were directly taken from coarse-grained MD simulations. Three separate RFAs were trained using feature/response data of varying resolution, which provided deep insights into the highly correlated macromolecular dynamics occurring in concert with A2 domain unfolding events. The algorithm is used to analyze results of simulation, but has been developed for use with experimental data as well.
Collapse
Affiliation(s)
- Michael J Morabito
- Department of Mechanical Engineering and Mechanics, Lehigh University, Bethlehem, PA, 18015, United States
| | - Mustafa Usta
- G.W. Woodruff School of Mechanical Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, United States
| | - Xuanhong Cheng
- Department of Materials Science and Engineering, Lehigh University, Bethlehem, PA, 18015, United States
- Department of Bioengineering, Lehigh University, Bethlehem, PA, 18015, United States
| | - Xiaohui F Zhang
- Department of Mechanical Engineering and Mechanics, Lehigh University, Bethlehem, PA, 18015, United States
- Department of Bioengineering, Lehigh University, Bethlehem, PA, 18015, United States
| | - Alparslan Oztekin
- Department of Mechanical Engineering and Mechanics, Lehigh University, Bethlehem, PA, 18015, United States
| | - Edmund B Webb
- Department of Mechanical Engineering and Mechanics, Lehigh University, Bethlehem, PA, 18015, United States.
| |
Collapse
|
48
|
Milanese C, Payán-Gómez C, Galvani M, Molano González N, Tresini M, Nait Abdellah S, van Roon-Mom WMC, Figini S, Marinus J, van Hilten JJ, Mastroberardino PG. Peripheral mitochondrial function correlates with clinical severity in idiopathic Parkinson's disease. Mov Disord 2019; 34:1192-1202. [PMID: 31136028 PMCID: PMC6771759 DOI: 10.1002/mds.27723] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Revised: 05/02/2019] [Accepted: 05/06/2019] [Indexed: 12/21/2022] Open
Abstract
Background Parkinson's disease is an intractable disorder with heterogeneous clinical presentation that may reflect different underlying pathogenic mechanisms. Surrogate indicators of pathogenic processes correlating with clinical measures may assist in better patient stratification. Mitochondrial function, which is impaired in and central to PD pathogenesis, may represent one such surrogate indicator. Methods Mitochondrial function was assessed by respirometry experiment in fibroblasts derived from idiopathic patients (n = 47) in normal conditions and in experimental settings that do not permit glycolysis and therefore force energy production through mitochondrial function. Respiratory parameters and clinical measures were correlated with bivariate analysis. Machine‐learning‐based classification and regression trees were used to classify patients on the basis of biochemical and clinical measures. The effects of mitochondrial respiration on α‐synuclein stress were assessed monitoring the protein phosphorylation in permitting versus restrictive glycolysis conditions. Results Bioenergetic properties in peripheral fibroblasts correlate with clinical measures in idiopathic patients, and the correlation is stronger with predominantly nondopaminergic signs. Bioenergetic analysis under metabolic stress, in which energy is produced solely by mitochondria, shows that patients’ fibroblasts can augment respiration, therefore indicating that mitochondrial defects are reversible. Forcing energy production through mitochondria, however, favors α‐synuclein stress in different cellular experimental systems. Machine‐learning‐based classification identified different groups of patients in which increasing disease severity parallels higher mitochondrial respiration. Conclusion The suppression of mitochondrial activity in PD may be an adaptive strategy to cope with concomitant pathogenic factors. Moreover, mitochondrial measures in fibroblasts are potential peripheral biomarkers to follow disease progression. © 2019 The Authors. Movement Disorders published by Wiley Periodicals, Inc. on behalf of International Parkinson and Movement Disorder Society.
Collapse
Affiliation(s)
- Chiara Milanese
- Department of Molecular Genetics, Erasmus Medical Center, Rotterdam, The Netherlands
| | - César Payán-Gómez
- Department of Molecular Genetics, Erasmus Medical Center, Rotterdam, The Netherlands.,Faculty of Natural Sciences and Mathematics, Universidad del Rosario, Bogotá, Colombia
| | - Marta Galvani
- Department of Mathematics, University of Pavia, Pavia, Italy
| | - Nicolás Molano González
- Center for Autoimmune Diseases Research, School of Medicine and Health Sciences, Universidad del Rosario, Bogotá, Colombia
| | - Maria Tresini
- Department of Molecular Genetics, Erasmus Medical Center, Rotterdam, The Netherlands
| | - Soraya Nait Abdellah
- Department of Molecular Genetics, Erasmus Medical Center, Rotterdam, The Netherlands
| | | | - Silvia Figini
- Political and Social Sciences, University of Pavia, Pavia, Italy
| | - Johan Marinus
- Department of Neurology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Jacobus J van Hilten
- Department of Neurology, Leiden University Medical Centre, Leiden, The Netherlands
| | - Pier G Mastroberardino
- Department of Molecular Genetics, Erasmus Medical Center, Rotterdam, The Netherlands.,Department of Life, Health and Environmental Sciences, University of L'Aquila, L'Aquila, Italy
| |
Collapse
|
49
|
A Brief Review of Random Forests for Water Scientists and Practitioners and Their Recent History in Water Resources. WATER 2019. [DOI: 10.3390/w11050910] [Citation(s) in RCA: 102] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Random forests (RF) is a supervised machine learning algorithm, which has recently started to gain prominence in water resources applications. However, existing applications are generally restricted to the implementation of Breiman’s original algorithm for regression and classification problems, while numerous developments could be also useful in solving diverse practical problems in the water sector. Here we popularize RF and their variants for the practicing water scientist, and discuss related concepts and techniques, which have received less attention from the water science and hydrologic communities. In doing so, we review RF applications in water resources, highlight the potential of the original algorithm and its variants, and assess the degree of RF exploitation in a diverse range of applications. Relevant implementations of random forests, as well as related concepts and techniques in the R programming language, are also covered.
Collapse
|
50
|
Lee S, Choe EK, Park B. Exploration of Machine Learning for Hyperuricemia Prediction Models Based on Basic Health Checkup Tests. J Clin Med 2019; 8:E172. [PMID: 30717373 PMCID: PMC6406925 DOI: 10.3390/jcm8020172] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 01/31/2019] [Accepted: 01/31/2019] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Machine learning (ML) is a promising methodology for classification and prediction applications in healthcare. However, this method has not been practically established for clinical data. Hyperuricemia is a biomarker of various chronic diseases. We aimed to predict uric acid status from basic healthcare checkup test results using several ML algorithms and to evaluate the performance. METHODS We designed a prediction model for hyperuricemia using a comprehensive health checkup database designed by the classification of ML algorithms, such as discrimination analysis, K-nearest neighbor, naïve Bayes (NBC), support vector machine, decision tree, and random forest classification (RFC). The performance of each algorithm was evaluated and compared with the performance of a conventional logistic regression (CLR) algorithm by receiver operating characteristic curve analysis. RESULTS Of the 38,001 participants, 7705 were hyperuricemic. For the maximum sensitivity criterion, NBC showed the highest sensitivity (0.73), and RFC showed the second highest (0.66); for the maximum balanced classification rate (BCR) criterion, RFC showed the highest BCR (0.68), and NBC showed the second highest (0.66) among the various ML algorithms for predicting uric acid status. In a comparison to the performance of NBC (area under the curve (AUC) = 0.669, 95% confidence intervals (CI) = 0.669⁻0.675) and RFC (AUC = 0.775, 95% CI 0.770⁻0.780) with a CLR algorithm (AUC = 0.568, 95% CI = 0.563⁻0.571), NBC and RFC showed significantly better performance (p < 0.001). CONCLUSIONS The ML model was superior to the CLR model for the prediction of hyperuricemia. Future studies are needed to determine the best-performing ML algorithms based on data set characteristics. We believe that this study will be informative for studies using ML tools in clinical research.
Collapse
Affiliation(s)
- Sangwoo Lee
- Network Division, Samsung Electronics, Suwon 16677, Korea.
| | - Eun Kyung Choe
- Department of Surgery, Seoul National University Hospital Healthcare System Gangnam Center, Seoul 06236, Korea.
- Department of Surgery, Seoul National University College of Medicine, Seoul 03080, Korea.
| | - Boram Park
- Department of Biomedical Science, Seoul National University Graduate School, Seoul 03081, Korea.
| |
Collapse
|