1
|
Hosseini N, Soflaei SS, Salehi-Sangani P, Yaghooti-Khorasani M, Shahri B, Rezaeifard H, Esmaily H, Ferns GA, Moohebati M, Ghayour-Mobarhan M. Association of Premature Ventricular Contraction (PVC) with hematological parameters: a data mining approach. Sci Rep 2025; 15:2514. [PMID: 39833257 PMCID: PMC11756411 DOI: 10.1038/s41598-025-86557-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Accepted: 01/13/2025] [Indexed: 01/22/2025] Open
Abstract
Premature ventricular contraction (PVC) is characterized by early repolarization of the myocardium originating from Purkinje fibers. PVC may occur in individuals who are otherwise healthy. However, it may be associated with some pathological conditions. In this research the association between hematological factors and PVC was studied. In this study, 9,035 participants were enrolled in the Mashhad stroke and heart atherosclerotic disorder (MASHAD) cohort study. The association of hematological factors with PVC was evaluated using different machine learning (ML) algorithms, including logistic regression (LR), C5.0, and boosting decision tree (DT). The dataset was divided into training and test, and each model's performance was appraised on the test dataset. All data analyses used SPSS version 26 and SPSS Modeler 10. The results show that the Boosting DT was the most effective algorithm. Boosting DT had an accuracy of 98.13% and 96.92% for males and females respectively. According to the models, RDW and PLT were the most significant hematological factors for both males and females. WBC, PDW, and HCT for males and RBC, MCV, and MXD for females were also important. Some hematological factors associated with PVC were found using ML models. Further studies are needed to confirm these results in other populations, considering the novelty of the exploration of the relationship between hematological parameters and PVC.
Collapse
Affiliation(s)
- Nafiseh Hosseini
- Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
- International UNESCO Center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
- Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Sara Saffar Soflaei
- International UNESCO Center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Pooria Salehi-Sangani
- Student Research Committee, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Mahdiyeh Yaghooti-Khorasani
- Radiation Oncology Research Center, Cancer Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Bahram Shahri
- Department of Cardiology, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Helia Rezaeifard
- International UNESCO Center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Habibollah Esmaily
- Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
- Social Determinants of Health Research center, Mashhad University of Medical Sciences, Mashhad, Iran
| | - Gordon A Ferns
- Division of Medical Education, Brighton and Sussex Medical School, Brighton, UK
| | - Mohsen Moohebati
- International UNESCO Center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran.
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
| | - Majid Ghayour-Mobarhan
- International UNESCO Center for Health-Related Basic Sciences and Human Nutrition, Mashhad University of Medical Sciences, Mashhad, Iran.
- Metabolic Syndrome Research Center, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
2
|
Xie K, Hou Y, Zhou X. Deep centroid: a general deep cascade classifier for biomedical omics data classification. Bioinformatics 2024; 40:btae039. [PMID: 38305432 PMCID: PMC10868341 DOI: 10.1093/bioinformatics/btae039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/13/2024] [Accepted: 01/30/2024] [Indexed: 02/03/2024] Open
Abstract
MOTIVATION Classification of samples using biomedical omics data is a widely used method in biomedical research. However, these datasets often possess challenging characteristics, including high dimensionality, limited sample sizes, and inherent biases across diverse sources. These factors limit the performance of traditional machine learning models, particularly when applied to independent datasets. RESULTS To address these challenges, we propose a novel classifier, Deep Centroid, which combines the stability of the nearest centroid classifier and the strong fitting ability of the deep cascade strategy. Deep Centroid is an ensemble learning method with a multi-layer cascade structure, consisting of feature scanning and cascade learning stages that can dynamically adjust the training scale. We apply Deep Centroid to three precision medicine applications-cancer early diagnosis, cancer prognosis, and drug sensitivity prediction-using cell-free DNA fragmentations, gene expression profiles, and DNA methylation data. Experimental results demonstrate that Deep Centroid outperforms six traditional machine learning models in all three applications, showcasing its potential in biological omics data classification. Furthermore, functional annotations reveal that the features scanned by the model exhibit biological significance, indicating its interpretability from a biological perspective. Our findings underscore the promising application of Deep Centroid in the classification of biomedical omics data, particularly in the field of precision medicine. AVAILABILITY AND IMPLEMENTATION Deep Centroid is available at both github (github.com/xiexiexiekuan/DeepCentroid) and Figshare (https://figshare.com/articles/software/Deep_Centroid_A_General_Deep_Cascade_Classifier_for_Biomedical_Omics_Data_Classification/24993516).
Collapse
Affiliation(s)
- Kuan Xie
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - Yuying Hou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| | - Xionghui Zhou
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
- Key Laboratory of Smart Farming for Agricultural Animals, Huazhong Agricultural University, Wuhan 430070, People’s Republic of China
| |
Collapse
|
3
|
Ho TC, Shah R, Mishra J, May AC, Tapert SF. Multi-level predictors of depression symptoms in the Adolescent Brain Cognitive Development (ABCD) study. J Child Psychol Psychiatry 2022; 63:1523-1533. [PMID: 35307818 PMCID: PMC9489813 DOI: 10.1111/jcpp.13608] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 02/25/2022] [Indexed: 01/09/2023]
Abstract
BACKGROUND While identifying risk factors for adolescent depression is critical for early prevention and intervention, most studies have sought to understand the role of isolated factors rather than across a broad set of factors. Here, we sought to examine multi-level factors that maximize the prediction of depression symptoms in US children participating in the Adolescent Brain and Cognitive Development (ABCD) study. METHODS A total of 7,995 participants from ABCD (version 3.0 release) provided complete data at baseline and 1-year follow-up data. Depression symptoms were measured with the Child Behavior Checklist. Predictive features included child demographic, environmental, and structural and resting-state fMRI variables, parental depression history and demographic characteristics. We used linear (elastic net regression, EN) and non-linear (gradient-boosted trees, GBT) predictive models to identify which set of features maximized prediction of depression symptoms at baseline and, separately, at 1-year follow-up. RESULTS Both linear and non-linear models achieved comparable results for predicting baseline (EN: MAE = 3.757; R2 = 0.156; GBT: MAE = 3.761; R2 = 0.147) and 1-year follow-up (EN: MAE = 4.255; R2 = 0.103; GBT: MAE = 4.262; R2 = 0.089) depression. Parental history of depression, greater family conflict, and shorter child sleep duration were among the top predictors of concurrent and future child depression symptoms across both models. Although resting-state fMRI features were relatively weaker predictors, functional connectivity of the caudate was consistently the strongest neural feature associated with depression symptoms at both timepoints. CONCLUSIONS Consistent with prior research, parental mental health, family environment, and child sleep quality are important risk factors for youth depression. Functional connectivity of the caudate is a relatively weaker predictor of depression symptoms but may represent a biomarker for depression risk.
Collapse
Affiliation(s)
- Tiffany C. Ho
- Department of Psychiatry & Behavioral Sciences; Weill Institute of Neurosciences; University of California, San Francisco, San Francisco, CA
| | - Rutvik Shah
- Department of Psychiatry & Behavioral Sciences; Weill Institute of Neurosciences; University of California, San Francisco, San Francisco, CA
- Department of Psychiatry, University of California, San Diego, San Diego, CA
| | - Jyoti Mishra
- Department of Psychiatry, University of California, San Diego, San Diego, CA
| | - April C. May
- Department of Psychiatry, University of California, San Diego, San Diego, CA
- San Diego State University/University of California San Diego Joint Doctoral Program in Clinical Psychology, San Diego, CA
| | - Susan F. Tapert
- Department of Psychiatry, University of California, San Diego, San Diego, CA
| |
Collapse
|
4
|
Khodabandelu S, Ghaemian N, Khafri S, Ezoji M, Khaleghi S. Development of a Machine Learning-Based Screening Method for Thyroid Nodules Classification by Solving the Imbalance Challenge in Thyroid Nodules Data. J Res Health Sci 2022; 22:e00555. [PMID: 36511373 PMCID: PMC10422153 DOI: 10.34172/jrhs.2022.90] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2022] [Revised: 07/23/2022] [Accepted: 08/02/2022] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND This study aims to show the impact of imbalanced data and the typical evaluation methods in developing and misleading assessments of machine learning-based models for preoperative thyroid nodules screening. STUDY DESIGN A retrospective study. METHODS The ultrasonography features for 431 thyroid nodules cases were extracted from medical records of 313 patients in Babol, Iran. Since thyroid nodules are commonly benign, the relevant data are usually unbalanced in classes. It can lead to the bias of learning models toward the majority class. To solve it, a hybrid resampling method called the Smote-was used to creating balance data. Following that, the support vector classification (SVC) algorithm was trained by balance and unbalanced datasets as Models 2 and 3, respectively, in Python language programming. Their performance was then compared with the logistic regression model as Model 1 that fitted traditionally. RESULTS The prevalence of malignant nodules was obtained at 14% (n = 61). In addition, 87% of the patients in this study were women. However, there was no difference in the prevalence of malignancy for gender. Furthermore, the accuracy, area under the curve, and geometric mean values were estimated at 92.1%, 93.2%, and 76.8% for Model 1, 91.3%, 93%, and 77.6% for Model 2, and finally, 91%, 92.6% and 84.2% for Model 3, respectively. Similarly, the results identified Micro calcification, Taller than wide shape, as well as lack of ISO and hyperechogenicity features as the most effective malignant variables. CONCLUSION Paying attention to data challenges, such as data imbalances, and using proper criteria measures can improve the performance of machine learning models for preoperative thyroid nodules screening.
Collapse
Affiliation(s)
- Sajad Khodabandelu
- Student Research Committee, School of Medicine, Faculty of Health, Babol University of Medical Science, Babol, Iran
| | - Naser Ghaemian
- Department of Radiology, Babol University of Medical Sciences, Babol, Iran
| | - Soraya Khafri
- Research Center for Social Determinants of Health, Health Research Institute, Department of Biostatistics and Epidemiology, Faculty of Health, Babol University of Medical Sciences, Babol, Iran
| | - Mehdi Ezoji
- Faculty of Electrical and Computer Engineering, Babol Noshirvani University of Technology, Babol, Iran
| | - Sara Khaleghi
- Student Research Committee, School of Medicine, Faculty of Health, Babol University of Medical Science, Babol, Iran
| |
Collapse
|
5
|
El Barakaz F, Boutkhoum O, Hanine M, El Moutaouakkil A, Rustam F, Din S, Ashraf I. Optimization of Imbalanced and Multidimensional Learning Under Bayes Minimum Risk and Savings Measure. BIG DATA 2022; 10:425-439. [PMID: 35723636 DOI: 10.1089/big.2021.0225] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The full potential of data analysis is crippled by imbalanced and high-dimensional data, which makes these topics significantly important. Consequently, substantial research efforts have been directed to obtain dimension reduction and resolve data imbalance, especially in the context of fraud detection analysis. This work aims to investigate the effectiveness of hybrid learning methods for alleviating the class imbalance and integrating dimensionality reduction techniques. In this regard, the current study examines different classification combinations to achieve optimal savings and improve classification performance. Against this background, several well-known machine learning models are selected such as logistic regression, random forest, CatBoost (CB), and XGBoost. These models are constructed and optimized based on Bayes minimum risk (BMR) associated with the oversampling method synthetic minority oversampling technique (SMOTE) and different feature selection (FS) techniques, both univariate and multivariate. To investigate the performance of the proposed approach, different possible scenarios are analyzed both with and without balancing, with and without FS, and optimization using BMR. With a major insight about the best method to use, BMR shows a good optimization when used with SMOTE, symmetrical uncertainty for FS, and CB as a boosted classifier, principally in terms of F1 score and savings metrics.
Collapse
Affiliation(s)
- Fatima El Barakaz
- Laroseri Laboratory, Faculty of Sciences, Chouaib Doukkali University, El Jadida, Morocco
| | - Omar Boutkhoum
- Laroseri Laboratory, Faculty of Sciences, Chouaib Doukkali University, El Jadida, Morocco
| | - Mohamed Hanine
- Department of Telecommunications, Networks and Informatics, LTI Laboratory, ENSA, Chouaib Doukkali University, El Jadida, Morocco
| | | | - Furqan Rustam
- Department of Software Engineering, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Sadia Din
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan-si, Republic of Korea
| | - Imran Ashraf
- Department of Information and Communication Engineering, Yeungnam University, Gyeongsan-si, Republic of Korea
| |
Collapse
|
6
|
Boo Y, Choi Y. Comparison of mortality prediction models for road traffic accidents: an ensemble technique for imbalanced data. BMC Public Health 2022; 22:1476. [PMID: 35918672 PMCID: PMC9344638 DOI: 10.1186/s12889-022-13719-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Accepted: 06/27/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Injuries caused by RTA are classified under the International Classification of Diseases-10 as 'S00-T99' and represent imbalanced samples with a mortality rate of only 1.2% among all RTA victims. To predict the characteristics of external causes of road traffic accident (RTA) injuries and mortality, we compared performances based on differences in the correction and classification techniques for imbalanced samples. METHODS The present study extracted and utilized data spanning over a 5-year period (2013-2017) from the Korean National Hospital Discharge In-depth Injury Survey (KNHDS), a national level survey conducted by the Korea Disease Control and Prevention Agency, A total of eight variables were used in the prediction, including patient, accident, and injury/disease characteristics. As the data was imbalanced, a sample consisting of only severe injuries was constructed and compared against the total sample. Considering the characteristics of the samples, preprocessing was performed in the study. The samples were standardized first, considering that they contained many variables with different units. Among the ensemble techniques for classification, the present study utilized Random Forest, Extra-Trees, and XGBoost. Four different over- and under-sampling techniques were used to compare the performance of algorithms using "accuracy", "precision", "recall", "F1", and "MCC". RESULTS The results showed that among the prediction techniques, XGBoost had the best performance. While the synthetic minority oversampling technique (SMOTE), a type of over-sampling, also demonstrated a certain level of performance, under-sampling was the most superior. Overall, prediction by the XGBoost model with samples using SMOTE produced the best results. CONCLUSION This study presented the results of an empirical comparison of the validity of sampling techniques and classification algorithms that affect the accuracy of imbalanced samples by combining two techniques. The findings could be used as reference data in classification analyses of imbalanced data in the medical field.
Collapse
Affiliation(s)
- Yookyung Boo
- Department of Health Administration, Dankook University, Cheonan, 31116, South Korea
| | - Youngjin Choi
- Department of Healthcare Management, Eulji University, Seongnam, 13135, South Korea.
| |
Collapse
|
7
|
Xu Y, Yu Z, Chen CLP. Classifier Ensemble Based on Multiview Optimization for High-Dimensional Imbalanced Data Classification. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2022; PP:870-883. [PMID: 35657843 DOI: 10.1109/tnnls.2022.3177695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
High-dimensional class imbalanced data have plagued the performance of classification algorithms seriously. Because of a large number of redundant/invalid features and the class imbalanced issue, it is difficult to construct an optimal classifier for high-dimensional imbalanced data. Classifier ensemble has attracted intensive attention since it can achieve better performance than an individual classifier. In this work, we propose a multiview optimization (MVO) to learn more effective and robust features from high-dimensional imbalanced data, based on which an accurate and robust ensemble system is designed. Specifically, an optimized subview generation (OSG) in MVO is first proposed to generate multiple optimized subviews from different scenarios, which can strengthen the classification ability of features and increase the diversity of ensemble members simultaneously. Second, a new evaluation criterion that considers the distribution of data in each optimized subview is developed based on which a selective ensemble of optimized subviews (SEOS) is designed to perform the subview selective ensemble. Finally, an oversampling approach is executed on the optimized view to obtain a new class rebalanced subset for the classifier. Experimental results on 25 high-dimensional class imbalanced datasets indicate that the proposed method outperforms other mainstream classifier ensemble methods.
Collapse
|
8
|
Choi S, Park J, Park S, Byon I, Choi HY. Establishment of a prediction tool for ocular trauma patients with machine learning algorithm. Int J Ophthalmol 2021; 14:1941-1949. [PMID: 34926212 DOI: 10.18240/ijo.2021.12.20] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 10/22/2021] [Indexed: 11/23/2022] Open
Abstract
AIM To predict final visual acuity and analyze significant factors influencing open globe injury prognosis. METHODS Prediction models were built using a supervised classification algorithm from Microsoft Azure Machine Learning Studio. The best algorithm was selected to analyze the predicted final visual acuity. We retrospectively reviewed the data of 171 patients with open globe injury who visited the Pusan National University Hospital between January 2010 and July 2020. We then applied cross-validation, the permutation feature importance method, and the synthetic minority over-sampling technique to enhance tool performance. RESULTS The two-class boosted decision tree model showed the best predictive performance. The accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve were 0.925, 0.962, 0.833, 0.893, and 0.971, respectively. To increase the efficiency and efficacy of the prognostic tool, the top 14 features were finally selected using the permutation feature importance method: (listed in the order of importance) retinal detachment, location of laceration, initial visual acuity, iris damage, surgeon, past history, size of the scleral laceration, vitreous hemorrhage, trauma characteristics, age, corneal injury, primary diagnosis, wound location, and lid laceration. CONCLUSION Here we devise a highly accurate model to predict the final visual acuity of patients with open globe injury. This tool is useful and easily accessible to doctors and patients, reducing the socioeconomic burden. With further multicenter verification using larger datasets and external validation, we expect this model to become useful worldwide.
Collapse
Affiliation(s)
- Seungkwon Choi
- Department of Ophthalmology, Pusan National University Hospital, Busan 49241, Republic of Korea.,Biomedical Research Institute of Pusan National University Hospital, Busan 49241, Republic of Korea
| | - Jungyul Park
- Department of Ophthalmology, Pusan National University Hospital, Busan 49241, Republic of Korea.,Biomedical Research Institute of Pusan National University Hospital, Busan 49241, Republic of Korea
| | - Sungwho Park
- Department of Ophthalmology, Pusan National University Hospital, Busan 49241, Republic of Korea.,Biomedical Research Institute of Pusan National University Hospital, Busan 49241, Republic of Korea
| | - Iksoo Byon
- Department of Ophthalmology, Pusan National University Hospital, Busan 49241, Republic of Korea.,Biomedical Research Institute of Pusan National University Hospital, Busan 49241, Republic of Korea
| | - Hee-Young Choi
- Department of Ophthalmology, Pusan National University Hospital, Busan 49241, Republic of Korea.,Biomedical Research Institute of Pusan National University Hospital, Busan 49241, Republic of Korea.,Department of Ophthalmology, School of Medicine, Pusan National University, Busan 49241, Republic of Korea
| |
Collapse
|
9
|
Alsenan S, Al-Turaiki I, Hafez A. A deep learning approach to predict blood-brain barrier permeability. PeerJ Comput Sci 2021; 7:e515. [PMID: 34179448 PMCID: PMC8205267 DOI: 10.7717/peerj-cs.515] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 04/08/2021] [Indexed: 06/13/2023]
Abstract
The blood-brain barrier plays a crucial role in regulating the passage of 98% of the compounds that enter the central nervous system (CNS). Compounds with high permeability must be identified to enable the synthesis of brain medications for the treatment of various brain diseases, such as Parkinson's, Alzheimer's, and brain tumors. Throughout the years, several models have been developed to solve this problem and have achieved acceptable accuracy scores in predicting compounds that penetrate the blood-brain barrier. However, predicting compounds with "low" permeability has been a challenging task. In this study, we present a deep learning (DL) classification model to predict blood-brain barrier permeability. The proposed model addresses the fundamental issues presented in former models: high dimensionality, class imbalances, and low specificity scores. We address these issues to enhance the high-dimensional, imbalanced dataset before developing the classification model: the imbalanced dataset is addressed using oversampling techniques and the high dimensionality using a non-linear dimensionality reduction technique known as kernel principal component analysis (KPCA). This technique transforms the high-dimensional dataset into a low-dimensional Euclidean space while retaining invaluable information. For the classification task, we developed an enhanced feed-forward deep learning model and a convolutional neural network model. In terms of specificity scores (i.e., predicting compounds with low permeability), the results obtained by the enhanced feed-forward deep learning model outperformed those obtained by other models in the literature that were developed using the same technique. In addition, the proposed convolutional neural network model surpassed models used in other studies in multiple accuracy measures, including overall accuracy and specificity. The proposed approach solves the problem inevitably faced with obtaining low specificity resulting in high false positive rate.
Collapse
Affiliation(s)
- Shrooq Alsenan
- Information Systems Department, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Isra Al-Turaiki
- Information Technology Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Alaaeldin Hafez
- Information Systems Department, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| |
Collapse
|
10
|
Ding YC, Wu H, Davicioni E, Karnes RJ, Klein EA, Den RB, Steele L, Neuhausen SL. Prostate cancer in young men represents a distinct clinical phenotype: gene expression signature to predict early metastases. JOURNAL OF TRANSLATIONAL GENETICS AND GENOMICS 2021; 5:50-61. [PMID: 33928239 PMCID: PMC8081383 DOI: 10.20517/jtgg.2021.01] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
Abstract
AIM Several genomic signatures are available to predict Prostate Cancer (CaP) outcomes based on gene expression in prostate tissue. However, no signature was tailored to predict aggressive CaP in younger men. We attempted to develop a gene signature to predict the development of metastatic CaP in young men. METHODS We measured genome-wide gene expression for 119 tumor and matched benign tissues from prostatectomies of men diagnosed at ≤ 50 years and > 70 years and identified age-related differentially expressed genes (DEGs) for tissue type and Gleason score. Age-related DEGs were selected using the improved Prediction Analysis of Microarray method (iPAM) to construct and validate a classifier to predict metastasis using gene expression data from 1,232 prostatectomies. Accuracy in predicting early metastasis was quantified by the area under the curve (AUC) of receiver operating characteristic (ROC), and abundance of immune cells in the tissue microenvironment was estimated using gene expression data. RESULTS Thirty-six age-related DEGs were selected for the iPAM classifier. The AUC of five-year survival ROC for the iPAM classifier was 0.87 (95%CI: 0.78-0.94) in young (≤ 55 years), 0.82 (95%CI: 0.76-0.88) in middle-aged (56-70 years), and 0.69 (95%CI: 0.55-0.69) in old (> 70 years) patients. Metastasis-associated immune responses in the tumor microenvironment were more pronounced in young and middle-aged patients than in old ones, potentially explaining the difference in accuracy of prediction among the groups. CONCLUSION We developed a genomic classifier with high precision to predict early metastasis for younger CaP patients and identified age-related differences in immune response to metastasis development.
Collapse
Affiliation(s)
- Yuan C. Ding
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, California, CA 91010, USA
| | - Huiqing Wu
- Department of Pathology, City of Hope Medical Center, Duarte, California, CA 91010, USA
| | - Elai Davicioni
- GenomeDX Biosciences, Vancouver, British Columbia V6B 1B8, Canada
| | - R. Jeffrey Karnes
- Department of Urology, Mayo Clinic, Rochester, Minnesota, MN 55905, USA
| | - Eric A. Klein
- Glickman Urological and Kidney Institute, Cleveland Clinic, Ohio, OH 44195, USA
| | - Robert B. Den
- Department of Radiation Oncology, Thomas Jefferson University, Philadelphia, Pennsylvania, PA 19044, USA
| | - Linda Steele
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, California, CA 91010, USA
| | - Susan L. Neuhausen
- Department of Population Sciences, Beckman Research Institute of City of Hope, Duarte, California, CA 91010, USA
| |
Collapse
|
11
|
Abstract
Supplemental Digital Content is available in the text. Neutropenia is a common side effect of myelosuppressive chemotherapy and is associated with adverse outcomes. Early Warning Scores are used to identify at-risk patients and facilitate rapid clinical interventions. Since few Early Warning Scores have been validated in patients with neutropenia, we aimed to create predictive models and nomograms of fever, ICU transfer, and mortality in hospitalized neutropenic patients.
Collapse
|
12
|
Mboya IB, Mahande MJ, Mohammed M, Obure J, Mwambi HG. Prediction of perinatal death using machine learning models: a birth registry-based cohort study in northern Tanzania. BMJ Open 2020; 10:e040132. [PMID: 33077570 PMCID: PMC7574940 DOI: 10.1136/bmjopen-2020-040132] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
OBJECTIVE We aimed to determine the key predictors of perinatal deaths using machine learning models compared with the logistic regression model. DESIGN A secondary data analysis using the Kilimanjaro Christian Medical Centre (KCMC) Medical Birth Registry cohort from 2000 to 2015. We assessed the discriminative ability of models using the area under the receiver operating characteristics curve (AUC) and the net benefit using decision curve analysis. SETTING The KCMC is a zonal referral hospital located in Moshi Municipality, Kilimanjaro region, Northern Tanzania. The Medical Birth Registry is within the hospital grounds at the Reproductive and Child Health Centre. PARTICIPANTS Singleton deliveries (n=42 319) with complete records from 2000 to 2015. PRIMARY OUTCOME MEASURES Perinatal death (composite of stillbirths and early neonatal deaths). These outcomes were only captured before mothers were discharged from the hospital. RESULTS The proportion of perinatal deaths was 3.7%. There were no statistically significant differences in the predictive performance of four machine learning models except for bagging, which had a significantly lower performance (AUC 0.76, 95% CI 0.74 to 0.79, p=0.006) compared with the logistic regression model (AUC 0.78, 95% CI 0.76 to 0.81). However, in the decision curve analysis, the machine learning models had a higher net benefit (ie, the correct classification of perinatal deaths considering a trade-off between false-negatives and false-positives)-over the logistic regression model across a range of threshold probability values. CONCLUSIONS In this cohort, there was no significant difference in the prediction of perinatal deaths between machine learning and logistic regression models, except for bagging. The machine learning models had a higher net benefit, as its predictive ability of perinatal death was considerably superior over the logistic regression model. The machine learning models, as demonstrated by our study, can be used to improve the prediction of perinatal deaths and triage for women at risk.
Collapse
Affiliation(s)
- Innocent B Mboya
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, KwaZulu-Natal, South Africa
- Department of Epidemiology and Biostatistics, Institute of Public Health, Kilimanjaro Christian Medical University College, Moshi, Tanzania
| | - Michael J Mahande
- Department of Epidemiology and Biostatistics, Institute of Public Health, Kilimanjaro Christian Medical University College, Moshi, Tanzania
| | - Mohanad Mohammed
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, KwaZulu-Natal, South Africa
| | - Joseph Obure
- Department of Obstetrics and Gynecology, Kilimanjaro Christian Medical Center, Moshi, Tanzania
| | - Henry G Mwambi
- School of Mathematics, Statistics, and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, KwaZulu-Natal, South Africa
| |
Collapse
|
13
|
Gene Expression Clustering and Selected Head and Neck Cancer Gene Signatures Highlight Risk Probability Differences in Oral Premalignant Lesions. Cells 2020; 9:cells9081828. [PMID: 32756466 PMCID: PMC7466020 DOI: 10.3390/cells9081828] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2020] [Revised: 07/27/2020] [Accepted: 07/31/2020] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Oral premalignant lesions (OPLs) represent the most common oral precancerous conditions. One of the major challenges in this field is the identification of OPLs at higher risk for oral squamous cell cancer (OSCC) development, by discovering molecular pathways deregulated in the early steps of malignant transformation. Analysis of deregulated levels of single genes and pathways has been successfully applied to head and neck squamous cell cancers (HNSCC) and OSCC with prognostic/predictive implications. Exploiting the availability of gene expression profile and clinical follow-up information of a well-characterized cohort of OPL patients, we aim to dissect tissue OPL gene expression to identify molecular clusters/signatures associated with oral cancer free survival (OCFS). MATERIALS AND METHODS The gene expression data of 86 OPL patients were challenged with: an HNSCC specific 6 molecular subtypes model (Immune related: HPV related, Defense Response and Immunoreactive; Mesenchymal, Hypoxia and Classical); one OSCC-specific signature (13 genes); two metabolism-related signatures (3 genes and signatures raised from 6 metabolic pathways associated with prognosis in HNSCC and OSCC, respectively); a hypoxia gene signature. The molecular stratification and high versus low expression of the signatures were correlated with OCFS by Kaplan-Meier analyses. The association of gene expression profiles among the tested biological models and clinical covariates was tested through variance partition analysis. RESULTS Patients with Mesenchymal, Hypoxia and Classical clusters showed an higher risk of malignant transformation in comparison with immune-related ones (log-rank test, p = 0.0052) and they expressed four enriched hallmarks: "TGF beta signaling" "angiogenesis", "unfolded protein response", "apical junction". Overall, 54 cases entered in the immune related clusters, while the remaining 32 cases belonged to the other clusters. No other signatures showed association with OCFS. Our variance partition analysis proved that clinical and molecular features are able to explain only 21% of gene expression data variability, while the remaining 79% refers to residuals independent of known parameters. CONCLUSIONS Applying the existing signatures derived from HNSCC to OPL, we identified only a protective effect for immune-related signatures. Other gene expression profiles derived from overt cancers were not able to identify the risk of malignant transformation, possibly because they are linked to later stages of cancer progression. The availability of a new well-characterized set of OPL patients and further research is needed to improve the identification of adequate prognosticators in OPLs.
Collapse
|
14
|
Pei W, Xue B, Shang L, Zhang M. Genetic programming for high-dimensional imbalanced classification with a new fitness function and program reuse mechanism. Soft comput 2020. [DOI: 10.1007/s00500-020-05056-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
15
|
Molecular subtypes of triple-negative breast cancer in women of different race and ethnicity. Oncotarget 2019; 10:198-208. [PMID: 30719214 PMCID: PMC6349443 DOI: 10.18632/oncotarget.26559] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2018] [Accepted: 12/27/2018] [Indexed: 12/22/2022] Open
Abstract
Molecular subtypes of triple negative breast cancer (TNBC) are associated with variation in survival and may assist in treatment selection. However, the association of patient race or ethnicity with subtypes of TNBC and clinical outcome has not been addressed. Using nCounter Gene Expression Codesets, we classified TNBCs into subtypes: basal-like immune-activated (BLIA), basal-like immunosuppressed (BLIS), luminal androgen receptor (LAR), and mesenchymal (MES) in 48 Hispanic, 12 African-American, 21 Asian, and 34 White patients. Mean age at diagnosis was significantly associated with subtype, with the youngest mean age (50 years) in MES and the oldest mean age (64 years) in LAR (p < 0.0005). Subtype was significantly associated with tumor grade (p = 0.0012) and positive lymph nodes (p = 0.021), with a marginally significant association of tumor stage (p = 0.076). In multivariate Cox-proportional hazards modeling, BLIS was associated with worst survival and LAR with best survival. Hispanics had a significantly higher proportion of BLIS (53%, p = 0.03), whereas Asians had a lower proportion of BLIS (19%, p = 0.05) and a higher proportion of LAR (38%, p = 0.06) compared to the average proportion across all groups. These differences in proportions of subtype across racial and ethnic groups may explain differences in their outcomes. Determining subtypes of TNBC facilitates understanding of the heterogeneity of the TNBCs and provides a foundation for developing subtype-specific therapies and better predictors of TNBC prognosis for all races and ethnicities.
Collapse
|
16
|
Wang Z, Yang H, Wu Z, Wang T, Li W, Tang Y, Liu G. In Silico Prediction of Blood-Brain Barrier Permeability of Compounds by Machine Learning and Resampling Methods. ChemMedChem 2018; 13:2189-2201. [PMID: 30110511 DOI: 10.1002/cmdc.201800533] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Indexed: 12/14/2022]
Abstract
The blood-brain barrier (BBB) as a part of absorption protects the central nervous system by separating the brain tissue from the bloodstream. In recent years, BBB permeability has become a critical issue in chemical ADMET prediction, but almost all models were built using imbalanced data sets, which caused a high false-positive rate. Therefore, we tried to solve the problem of biased data sets and built a reliable classification model with 2358 compounds. Machine learning and resampling methods were used simultaneously for the refinement of models with both 2 D molecular descriptors and molecular fingerprints to represent the chemicals. Through a series of evaluation, we realized that resampling methods such as Synthetic Minority Oversampling Technique (SMOTE) and SMOTE+edited nearest neighbor could effectively solve the problem of imbalanced data sets and that MACCS fingerprint combined with support vector machine performed the best. After the final construction of a consensus model, the overall accuracy rate was increased to 0.966 for the final external data set. Also, the accuracy rate of the model for the test set was 0.919, with an excellent balanced capacity of 0.925 (sensitivity) to predict BBB-positive compounds and of 0.899 (specificity) to predict BBB-negative compounds. Compared with other BBB classification models, our models reduced the rate of false positives and were more robust in prediction of BBB-positive as well as BBB-negative compounds, which would be quite helpful in early drug discovery.
Collapse
Affiliation(s)
- Zhuang Wang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Hongbin Yang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Zengrui Wu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Tianduanyi Wang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Weihua Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Yun Tang
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| | - Guixia Liu
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China
| |
Collapse
|
17
|
Bukowski R, Sadovsky Y, Goodarzi H, Zhang H, Biggio JR, Varner M, Parry S, Xiao F, Esplin SM, Andrews W, Saade GR, Ilekis JV, Reddy UM, Baldwin DA. Onset of human preterm and term birth is related to unique inflammatory transcriptome profiles at the maternal fetal interface. PeerJ 2017; 5:e3685. [PMID: 28879060 PMCID: PMC5582610 DOI: 10.7717/peerj.3685] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2017] [Accepted: 07/22/2017] [Indexed: 12/18/2022] Open
Abstract
Background Preterm birth is a main determinant of neonatal mortality and morbidity and a major contributor to the overall mortality and burden of disease. However, research of the preterm birth is hindered by the imprecise definition of the clinical phenotype and complexity of the molecular phenotype due to multiple pregnancy tissue types and molecular processes that may contribute to the preterm birth. Here we comprehensively evaluate the mRNA transcriptome that characterizes preterm and term labor in tissues comprising the pregnancy using precisely phenotyped samples. The four complementary phenotypes together provide comprehensive insight into preterm and term parturition. Methods Samples of maternal blood, chorion, amnion, placenta, decidua, fetal blood, and myometrium from the uterine fundus and lower segment (n = 183) were obtained during cesarean delivery from women with four complementary phenotypes: delivering preterm with (PL) and without labor (PNL), term with (TL) and without labor (TNL). Enrolled were 35 pregnant women with four precisely and prospectively defined phenotypes: PL (n = 8), PNL (n = 10), TL (n = 7) and TNL (n = 10). Gene expression data were analyzed using shrunken centroid analysis to identify a minimal set of genes that uniquely characterizes each of the four phenotypes. Expression profiles of 73 genes and non-coding RNA sequences uniquely identified each of the four phenotypes. The shrunken centroid analysis and 10 times 10-fold cross-validation was also used to minimize false positive finings and overfitting. Identified were the pathways and molecular processes associated with and the cis-regulatory elements in gene’s 5′ promoter or 3′-UTR regions of the set of genes which expression uniquely characterized the four phenotypes. Results The largest differences in gene expression among the four groups occurred at maternal fetal interface in decidua, chorion and amnion. The gene expression profiles showed suppression of chemokines expression in TNL, withdrawal of this suppression in TL, activation of multiple pathways of inflammation in PL, and an immune rejection profile in PNL. The genes constituting expression signatures showed over-representation of three putative regulatory elements in their 5′and 3′ UTR regions. Conclusions The results suggest that pregnancy is maintained by downregulation of chemokines at the maternal-fetal interface. Withdrawal of this downregulation results in the term birth and its overriding by the activation of multiple pathways of the immune system in the preterm birth. Complications of the pregnancy associated with impairment of placental function, which necessitated premature delivery of the fetus in the absence of labor, show gene expression patterns associated with immune rejection.
Collapse
Affiliation(s)
- Radek Bukowski
- Dell Medical School, Department of Women's Health, University of Texas at Austin, Austin, TX, United States of America
| | - Yoel Sadovsky
- Magee-Womens Research Institute, University of Pittsburgh, Pittsburgh, PA, United States of America
| | - Hani Goodarzi
- Department of Biophysics & Biochemistry, University of California, San Francisco, San Francisco, CA, United States of America
| | - Heping Zhang
- School of Public Health, Department of Biostatistics, Yale University, New Haven, CT, United States of America
| | - Joseph R Biggio
- School of Medicine, Department of Obstetrics and Gynecology, University of Alabama - Birmingham, Birmingham, AL, United States of America
| | - Michael Varner
- School of Medicine, Intermountain Healthcare, Department of Obstetrics and Gynecology, University of Utah, Salt Lake City, UT, United States of America
| | - Samuel Parry
- School of Medicine, Department of Obstetrics and Gynecology, University of Pennsylvania, Philadelphia, PA, United States of America
| | - Feifei Xiao
- Arnold School of Public Health, Department of Epidemiology and Biostatistics, University of South Carolina, Columbia, SC, United States of America
| | - Sean M Esplin
- School of Medicine, Intermountain Healthcare, Department of Obstetrics and Gynecology, University of Utah, Salt Lake City, UT, United States of America
| | - William Andrews
- School of Medicine, Department of Obstetrics and Gynecology, University of Alabama - Birmingham, Birmingham, AL, United States of America
| | - George R Saade
- Department of Obstetrics and Gynecology, University of Texas Medical Branch at Galveston, Galveston, TX, United States of America
| | - John V Ilekis
- Pregnancy and Perinatology Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD, United States of America
| | - Uma M Reddy
- Pregnancy and Perinatology Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, Bethesda, MD, United States of America
| | | |
Collapse
|
18
|
Nearest shrunken centroids via alternative genewise shrinkages. PLoS One 2017; 12:e0171068. [PMID: 28199352 PMCID: PMC5310887 DOI: 10.1371/journal.pone.0171068] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2016] [Accepted: 01/16/2017] [Indexed: 11/22/2022] Open
Abstract
Nearest shrunken centroids (NSC) is a popular classification method for microarray data. NSC calculates centroids for each class and “shrinks” the centroids toward 0 using soft thresholding. Future observations are then assigned to the class with the minimum distance between the observation and the (shrunken) centroid. Under certain conditions the soft shrinkage used by NSC is equivalent to a LASSO penalty. However, this penalty can produce biased estimates when the true coefficients are large. In addition, NSC ignores the fact that multiple measures of the same gene are likely to be related to one another. We consider several alternative genewise shrinkage methods to address the aforementioned shortcomings of NSC. Three alternative penalties were considered: the smoothly clipped absolute deviation (SCAD), the adaptive LASSO (ADA), and the minimax concave penalty (MCP). We also showed that NSC can be performed in a genewise manner. Classification methods were derived for each alternative shrinkage method or alternative genewise penalty, and the performance of each new classification method was compared with that of conventional NSC on several simulated and real microarray data sets. Moreover, we applied the geometric mean approach for the alternative penalty functions. In general the alternative (genewise) penalties required fewer genes than NSC. The geometric mean of the class-specific prediction accuracies was improved, as well as the overall predictive accuracy in some cases. These results indicate that these alternative penalties should be considered when using NSC.
Collapse
|
19
|
Saini H, Lal SP, Naidu VV, Pickering VW, Singh G, Tsunoda T, Sharma A. Gene masking - a technique to improve accuracy for cancer classification with high dimensionality in microarray data. BMC Med Genomics 2016; 9:74. [PMID: 28117659 PMCID: PMC5260793 DOI: 10.1186/s12920-016-0233-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Background High dimensional feature space generally degrades classification in several applications. In this paper, we propose a strategy called gene masking, in which non-contributing dimensions are heuristically removed from the data to improve classification accuracy. Methods Gene masking is implemented via a binary encoded genetic algorithm that can be integrated seamlessly with classifiers during the training phase of classification to perform feature selection. It can also be used to discriminate between features that contribute most to the classification, thereby, allowing researchers to isolate features that may have special significance. Results This technique was applied on publicly available datasets whereby it substantially reduced the number of features used for classification while maintaining high accuracies. Conclusion The proposed technique can be extremely useful in feature selection as it heuristically removes non-contributing features to improve the performance of classifiers.
Collapse
Affiliation(s)
- Harsh Saini
- The University of the South Pacific, Laucala Bay, Suva, Fiji.
| | - Sunil Pranit Lal
- School of Engineering and Advanced Technology, Massey University, Palmerston North, New Zealand
| | | | | | - Gurmeet Singh
- The University of the South Pacific, Laucala Bay, Suva, Fiji
| | - Tatsuhiko Tsunoda
- RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan. .,CREST, JST, Yokohama, 230-0045, Japan. .,Medical Research Institute, Tokyo Medical and Dental University, Tokyo, 113-8510, Japan.
| | - Alok Sharma
- The University of the South Pacific, Laucala Bay, Suva, Fiji.,RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan.,CREST, JST, Yokohama, 230-0045, Japan.,Griffith University, Brisbane, Australia
| |
Collapse
|