1
|
de Souza VC, Rodrigues SA, Filho LRAG. Comparison of principal component analysis algorithms for imputation in agrometeorological data in high dimension and reduced sample size. PLoS One 2024; 19:e0315574. [PMID: 39739837 DOI: 10.1371/journal.pone.0315574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Accepted: 11/23/2024] [Indexed: 01/02/2025] Open
Abstract
Meteorological data acquired with precision, quality, and reliability are crucial in various agronomy fields, especially in studies related to reference evapotranspiration (ETo). ETo plays a fundamental role in the hydrological cycle, irrigation system planning and management, water demand modeling, water stress monitoring, water balance estimation, as well as in hydrological and environmental studies. However, temporal records often encounter issues such as missing measurements. The aim of this study was to evaluate the performance of alternative multivariate procedures for principal component analysis (PCA), using the Nonlinear Iterative Partial Least Squares (NIPALS) and Expectation-Maximization (EM) algorithms, for imputing missing data in time series of meteorological variables. This was carried out on high-dimensional and reduced-sample databases, covering different percentages of missing data. The databases, collected between 2011 and 2021, originated from 45 automatic weather stations in the São Paulo region, Brazil. They were used to create a daily time series of ETo. Five scenarios of missing data (10%, 20%, 30%, 40%, 50%) were simulated, in which datasets were randomly withdrawn from the ETo base. Subsequently, imputation was performed using the NIPALS-PCA, EM-PCA, and simple mean imputation (IM) procedures. This cycle was repeated 100 times, and average performance indicators were calculated. Statistical performance evaluation utilized the following indicators: correlation coefficient (r), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), Mean Square Error (MSE), Normalized Root Mean Square Error (nRMSE), Willmott Index (d), and performance index (c). In the scenario with 10% missing data, NIPALS-PCA achieved the lowest MAPE (15.4%), followed by EM-PCA (17.0%), while IM recorded a MAPE of 24.7%. In the scenario with 50% missing data, there was a performance reversal, with EM-PCA showing the lowest MAPE (19.1%), followed by NIPALS-PCA (19.9%). The NIPALS-PCA and EM-PCA approaches demonstrated good results in imputation (10% ≤ nRMSE < 20%), with NIPALS-PCA excelling in the 10%, 20%, and 30% scenarios, and EM-PCA in the 40% and 50% scenarios. Based on statistical evaluation, the NIPALS-PCA, EM-PCA, and IM imputation models proved suitable for estimating missing ETo data, with PCA imputation models in the NIPALS and EM algorithms showing the most promise. Future research should explore the effectiveness of various imputation methods in diverse climatic and geographical contexts, as well as develop new techniques considering the temporal and spatial structure of meteorological data, to advance understanding and climate prediction.
Collapse
Affiliation(s)
- Valter Cesar de Souza
- São Paulo State University (Unesp), School of Agriculture, Botucatu, São Paulo, Brasil
| | | | | |
Collapse
|
2
|
Shao J, Wu Q, Zhang Y, Liu C, Huo X, Wang C. Automatic 3D pelvimetry framework in CT images and its validation. Sci Rep 2024; 14:21431. [PMID: 39271720 PMCID: PMC11399230 DOI: 10.1038/s41598-024-72123-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Accepted: 09/04/2024] [Indexed: 09/15/2024] Open
Abstract
In the field of spinal pathology, sagittal balance of the spine is usually judged by the spatial structure and morphology of pelvis, which can be represented by pelvic parameters. Pelvic parameters, including pelvic incidence, pelvic tilt and sacral slope, are therefore essential for the diagnosis and treatment of spinal disorders, however, it is a time-consuming and laborious procedure to measure these parameters by traditional methods. In this paper, an automatic measurement framework for pelvic CT images was proposed to calculate three-dimensional (3D) pelvic parameters with the support of deep learning technology. Pelvic images were first preprocessed, and 3D reconstruction was then performed to obtain 3D pelvic model by the Visualization Toolkit. DRINet was trained to segment the femoral head region in the pelvic images, and 3D sphere fitting was performed to locate the femoral heads. In addition, VGG16 was adopted to recognize images containing superior sacral endplate, and the plane growth algorithm was used to fit the plane so that the midpoint and normal vector of the superior sacral endplate could be obtained. Finally, 3D pelvic parameters were automatically calculated, and compared with manual measurements for 15 patients. The proposed framework automatically generated 3D pelvic models, and calculated two-dimensional (2D) and 3D pelvic parameters from continuous CT images. Experiments demonstrated that the framework can greatly speed up the calculation of pelvic parameters, and these parameters are accurate when compared with the manual measurements. In conclusion, the proposed framework demonstrates good performance on automatic pelvimetry measurement by incorporating deep learning technology, and can well replace the traditional methods for pelvic parameter measurement.
Collapse
Affiliation(s)
- Junlin Shao
- School of Biomedical Engineering, Anhui Medical University, Hefei, 230032, China
| | - Qian Wu
- School of Humanistic Medicine, Anhui Medical University, Hefei, 230032, China
| | - Yuqian Zhang
- School of Biomedical Engineering, Anhui Medical University, Hefei, 230032, China
| | - Changqi Liu
- NR Electric Co., Ltd, Nanjing, 211102, China
| | - Xing Huo
- School of Mathematics, Hefei University of Technology, Hefei, 230009, China
| | - Changqing Wang
- School of Biomedical Engineering, Anhui Medical University, Hefei, 230032, China.
| |
Collapse
|
3
|
Han D, Kim S. Design optimization of large-scale bifacial photovoltaic module frame using deep learning surrogate model. Sci Rep 2024; 14:14592. [PMID: 38918445 PMCID: PMC11199489 DOI: 10.1038/s41598-024-64594-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Accepted: 06/11/2024] [Indexed: 06/27/2024] Open
Abstract
Recently, the wafers used in solar cells have been increasing in size, leading to larger module sizes and weights. The increased weight can cause deflection of photovoltaic (PV) module, which may lead to decreased cell efficiency. In this study, we developed a deep neural network (DNN)-based finite element (FE) surrogate model to obtain the optimal frame design factors that can improve deflection in large-scale bifacial PV module. Initially, an FE model was constructed for large-scale bifacial PV module. Based on this, the FE surrogate model was trained using 243 FEA datasets generated within the proposed range of factors. Furthermore, it was improved through Bayesian optimization and k-fold validation. As a result, the final loss value was 3.743 × 10 - 4 , and the average mean absolute percentage error (MAPE) and coefficient of determination ( R 2 ) values for deflection and weight were 0.0017, 0.9972 for the training set, and 0.0020, 0.9962 for the test set, respectively. This indicates that the trained FE surrogate model possesses significant accuracy. After generating 1 million datasets within the range of frame design factors, the trained model was used to obtain predictions. Based on this data, the frame design factors that minimize both deflection and weight were identified as about a = 1.5, b = 13.7, c = 1.5, d = 3.0, e = 4.3. At this point, the deflection was 11.1 mm, and the weight was 3.6 kg. After altering the frame shape with the derived factors, FEA was conducted. The results matched for both deflection and weight, with almost no error. At this point, the weight increased by approximately 12.8% compared to the existing, while the deflection decreased by about 9.6%. Additionally, we analyzed the relationship between deflection and weight for each factor and secured the basis for the derived results. Consequently, our FE surrogate model accurately predicted the FEA results and quickly identified the optimal factors that minimize deflection and weight.
Collapse
Affiliation(s)
- Dongwoon Han
- Gangwon Technology Application Division, Korea Institute of Industrial Technology, Wonju, 26336, Republic of Korea
| | - Seongtak Kim
- Gangwon Technology Application Division, Korea Institute of Industrial Technology, Wonju, 26336, Republic of Korea.
| |
Collapse
|
4
|
Chen H, Gu W, Zhang Q, Li X, Jiang X. Integrating attention mechanism and multi-scale feature extraction for fall detection. Heliyon 2024; 10:e31614. [PMID: 38831825 PMCID: PMC11145491 DOI: 10.1016/j.heliyon.2024.e31614] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2024] [Revised: 05/11/2024] [Accepted: 05/20/2024] [Indexed: 06/05/2024] Open
Abstract
Addressing the critical need for accurate fall event detection due to their potentially severe impacts, this paper introduces the Spatial Channel and Pooling Enhanced You Only Look Once version 5 small (SCPE-YOLOv5s) model. Fall events pose a challenge for detection due to their varying scales and subtle pose features. To address this problem, SCPE-YOLOv5s introduces spatial attention to the Efficient Channel Attention (ECA) network, which significantly enhances the model's ability to extract features from spatial pose distribution. Moreover, the model integrates average pooling layers into the Spatial Pyramid Pooling (SPP) network to support the multi-scale extraction of fall poses. Meanwhile, by incorporating the ECA network into SPP, the model effectively combines global and local features to further enhance the feature extraction. This paper validates the SCPE-YOLOv5s on a public dataset, demonstrating that it achieves a mean Average Precision of 88.29 %, outperforming the You Only Look Once version 5 small by 4.87 %. Additionally, the model achieves 57.4 frames per second. Therefore, SCPE-YOLOv5s provides a novel solution for fall event detection.
Collapse
Affiliation(s)
- Hao Chen
- School of Computer and Information Engineering, Nantong Institute of Technology, China
| | - Wenye Gu
- Affiliated Hospital of Nantong University, China
| | - Qiong Zhang
- School of Computer and Information Engineering, Nantong Institute of Technology, China
| | - Xiujing Li
- School of Computer and Information Engineering, Nantong Institute of Technology, China
| | - Xiaojing Jiang
- School of Computer and Information Engineering, Nantong Institute of Technology, China
| |
Collapse
|
5
|
Song C, Chen X, Tang C, Xue P, Jiang Y, Qiao Y. Artificial intelligence for HPV status prediction based on disease-specific images in head and neck cancer: A systematic review and meta-analysis. J Med Virol 2023; 95:e29080. [PMID: 37691329 DOI: 10.1002/jmv.29080] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2023] [Revised: 07/14/2023] [Accepted: 08/03/2023] [Indexed: 09/12/2023]
Abstract
Accurate early detection of the human papillomavirus (HPV) status in head and neck cancer (HNC) is crucial to identify at-risk populations, stratify patients, personalized treatment options, and predict prognosis. Artificial intelligence (AI) is an emerging tool to dissect imaging features. This systematic review and meta-analysis aimed to evaluate the performance of AI to predict the HPV positivity through the HPV-associated diseased images in HNC patients. A systematic literature search was conducted in databases including Ovid-MEDLINE, Embase, and Web of Science Core Collection for studies continuously published from inception up to October 30, 2022. Search strategies included keywords such as "artificial intelligence," "head and neck cancer," "HPV," and "sensitivity & specificity." Duplicates, articles without HPV predictions, letters, scientific reports, conference abstracts, or reviews were excluded. Binary diagnostic data were then extracted to generate contingency tables and then used to calculate the pooled sensitivity (SE), specificity (SP), area under the curve (AUC), and their 95% confidence interval (CI). A random-effects model was used for meta-analysis, four subgroup analyses were further explored. Totally, 22 original studies were included in the systematic review, 15 of which were eligible to generate 33 contingency tables for meta-analysis. The pooled SE and SP for all studies were 79% (95% CI: 75-82%) and 74% (95% CI: 69-78%) respectively, with an AUC of 0.83 (95% CI: 0.79-0.86). When only selecting one contingency table with the highest accuracy from each study, our analysis revealed a pooled SE of 79% (95% CI: 75-83%), SP of 75% (95% CI: 69-79%), and an AUC of 0.84 (95% CI: 0.81-0.87). The respective heterogeneities were moderate (I2 for SE and SP were 51.70% and 51.01%) and only low (35.99% and 21.44%). This evidence-based study showed an acceptable and promising performance for AI algorithms to predict HPV status in HNC but was not comparable to the routine p16 immunohistochemistry. The exploitation and optimization of AI algorithms warrant further research. Compared with previous studies, future studies anticipate to make progress in the selection of databases, improvement of international reporting guidelines, and application of high-quality deep learning algorithms.
Collapse
Affiliation(s)
- Cheng Song
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Xu Chen
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Chao Tang
- Shenzhen Maternity & Child Healthcare Hospital, Shenzhen, China
| | - Peng Xue
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Yu Jiang
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Youlin Qiao
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
- National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
6
|
Sun W, Song C, Tang C, Pan C, Xue P, Fan J, Qiao Y. Performance of deep learning algorithms to distinguish high-grade glioma from low-grade glioma: A systematic review and meta-analysis. iScience 2023; 26:106815. [PMID: 37250800 PMCID: PMC10209541 DOI: 10.1016/j.isci.2023.106815] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 03/23/2023] [Accepted: 05/02/2023] [Indexed: 05/31/2023] Open
Abstract
This study aims to evaluate deep learning (DL) performance in differentiating low- and high-grade glioma. Search online database for studies continuously published from 1st January 2015 until 16th August 2022. The random-effects model was used for synthesis, based on pooled sensitivity (SE), specificity (SP), and area under the curve (AUC). Heterogeneity was estimated using the Higgins inconsistency index (I2). 33 were ultimately included in the meta-analysis. The overall pooled SE and SP were 94% and 93%, with an AUC of 0.98. There was great heterogeneity in this field. Our evidence-based study shows DL achieves high accuracy in glioma grading. Subgroup analysis reveals several limitations in this field: 1) Diagnostic trials require standard method for data merging for AI; 2) small sample size; 3) poor-quality image preprocessing; 4) not standard algorithm development; 5) not standard data report; 6) different definition of HGG and LGG; and 7) poor extrapolation.
Collapse
Affiliation(s)
- Wanyi Sun
- Department of Cancer Epidemiology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Cheng Song
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Chao Tang
- Shenzhen Maternity & Child Healthcare Hospital, Shenzhen, China
| | - Chenghao Pan
- Department of Cancer Epidemiology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Peng Xue
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Jinhu Fan
- Department of Cancer Epidemiology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| | - Youlin Qiao
- School of Population Medicine and Public Health, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China
| |
Collapse
|
7
|
Zhu X, Wang D, Pedrycz W, Li Z. A Design of Granular Classifier Based on Granular Data Descriptors. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:1790-1801. [PMID: 34936563 DOI: 10.1109/tcyb.2021.3132636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Designing effective and efficient classifiers is a challenging task given the facts that data may exhibit different geometric structures and complex intrarelationships may exist within data. As a fundamental component of granular computing, information granules play a key role in human cognition. Therefore, it is of great interest to develop classifiers based on information granules such that highly interpretable human-centric models with higher accuracy can be constructed. In this study, we elaborate on a novel design methodology of granular classifiers in which information granules play a fundamental role. First, information granules are formed on the basis of labeled patterns following the principle of justifiable granularity. The diversity of samples embraced by each information granule is quantified and controlled in terms of the entropy criterion. This design implies that the information granules constructed in this way form sound homogeneous descriptors characterizing the structure and the diversity of available experimental data. Next, granular classifiers are built in the presence of formed information granules. The classification result for any input instance is determined by summing the contents of the related information granules weighted by membership degrees. The experiments concerning both synthetic data and publicly available datasets demonstrate that the proposed models exhibit better prediction abilities than some commonly encountered classifiers (namely, linear regression, support vector machine, Naïve Bayes, decision tree, and neural networks) and come with enhanced interpretability.
Collapse
|
8
|
Szeghalmy S, Fazekas A. A Comparative Study of the Use of Stratified Cross-Validation and Distribution-Balanced Stratified Cross-Validation in Imbalanced Learning. SENSORS (BASEL, SWITZERLAND) 2023; 23:2333. [PMID: 36850931 PMCID: PMC9967638 DOI: 10.3390/s23042333] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Revised: 02/06/2023] [Accepted: 02/15/2023] [Indexed: 06/18/2023]
Abstract
Nowadays, the solution to many practical problems relies on machine learning tools. However, compiling the appropriate training data set for real-world classification problems is challenging because collecting the right amount of data for each class is often difficult or even impossible. In such cases, we can easily face the problem of imbalanced learning. There are many methods in the literature for solving the imbalanced learning problem, so it has become a serious question how to compare the performance of the imbalanced learning methods. Inadequate validation techniques can provide misleading results (e.g., due to data shift), which leads to the development of methods designed for imbalanced data sets, such as stratified cross-validation (SCV) and distribution optimally balanced SCV (DOB-SCV). Previous studies have shown that higher classification performance scores (AUC) can be achieved on imbalanced data sets using DOB-SCV instead of SCV. We investigated the effect of the oversamplers on this difference. The study was conducted on 420 data sets, involving several sampling methods and the DTree, kNN, SVM, and MLP classifiers. We point out that DOB-SCV often provides a little higher F1 and AUC values for classification combined with sampling. However, the results also prove that the selection of the sampler-classifier pair is more important for the classification performance than the choice between the DOB-SCV and the SCV techniques.
Collapse
|
9
|
Xu D, Chen R, Jiang Y, Wang S, Liu Z, Chen X, Fan X, Zhu J, Li J. Application of machine learning in the prediction of deficient mismatch repair in patients with colorectal cancer based on routine preoperative characterization. Front Oncol 2022; 12:1049305. [PMID: 36620593 PMCID: PMC9814116 DOI: 10.3389/fonc.2022.1049305] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Accepted: 12/07/2022] [Indexed: 12/24/2022] Open
Abstract
Simple summary Detecting deficient mismatch repair (dMMR) in patients with colorectal cancer is essential for clinical decision-making, including evaluation of prognosis, guidance of adjuvant chemotherapy and immunotherapy, and primary screening for Lynch syndrome. However, outside of tertiary care centers, existing detection methods are not widely disseminated and highly depend on the experienced pathologist. Therefore, it is of great clinical significance to develop a broadly accessible and low-cost tool for dMMR prediction, particularly prior to surgery. In this study, we developed a convenient and reliable model for predicting dMMR status in CRC patients on routine preoperative characterization utilizing multiple machine learning algorithms. This model will work as an automated screening tool for identifying patients suitable for mismatch repair testing and consequently for improving the detection rate of dMMR, while reducing unnecessary labor and cost in patients with proficient mismatch repair. Background Deficient mismatch repair (dMMR) indicates a sustained anti-tumor immune response and has a favorable prognosis in patients with colorectal cancer (CRC). Although all CRC patients are recommended to undergo dMMR testing after surgery, current diagnostic approaches are not available for all country hospitals and patients. Therefore, efficient and low-cost predictive models for dMMR, especially for preoperative evaluations, are warranted. Methods A large scale of 5596 CRC patients who underwent surgical resection and mismatch repair testing were enrolled and randomly divided into training and validation cohorts. The clinical features exploited for predicting dMMR comprised the demographic characteristics, preoperative laboratory data, and tumor burden information. Machine learning (ML) methods involving eight basic algorithms, ensemble learning methods, and fusion algorithms were adopted with 10-fold cross-validation, and their performance was evaluated based on the area under the receiver operating characteristic curve (AUC) and calibration curves. The clinical net benefits were assessed using a decision curve analysis (DCA), and a nomogram was developed to facilitate model clinical practicality. Results All models achieved an AUC of nearly 0.80 in the validation cohort, with the stacking model exhibiting the best performance (AUC = 0.832). Logistical DCA revealed that the stacking model yielded more clinical net benefits than the conventional regression models. In the subgroup analysis, the stacking model also predicted dMMR regardless of the clinical stage. The nomogram showed a favorable consistence with the actual outcome in the calibration curve. Conclusion With the aid of ML algorithms, we developed a novel and robust model for predicting dMMR in CRC patients with satisfactory discriminative performance and designed a user-friendly and convenient nomogram.
Collapse
Affiliation(s)
- Dong Xu
- Division of Digestive Surgery, Xijing Hospital of Digestive Diseases, Air force Medical University, Xi’an, China,School of Clinical Medicine, Xi’an Medical University, Xi’an, China
| | - Rujie Chen
- Division of Digestive Surgery, Xijing Hospital of Digestive Diseases, Air force Medical University, Xi’an, China,Department of Neurosurgery, Xijing Hospital, Air Force Medical University, Xi’an, China,State Key Laboratory of Cancer Biology, Institute of Digestive Diseases, Xijing Hospital, The Fourth Military Medical University, Xi’an, China
| | - Yu Jiang
- Division of Digestive Surgery, Xijing Hospital of Digestive Diseases, Air force Medical University, Xi’an, China,School of Clinical Medicine, Xi’an Medical University, Xi’an, China
| | - Shuai Wang
- Xi’an Institute of Flight of the Air Force, Ming Gang Station Hospital, Minggang, China
| | - Zhiyu Liu
- Division of Digestive Surgery, Xijing Hospital of Digestive Diseases, Air force Medical University, Xi’an, China,School of Clinical Medicine, Xi’an Medical University, Xi’an, China
| | - Xihao Chen
- Division of Digestive Surgery, Xijing Hospital of Digestive Diseases, Air force Medical University, Xi’an, China,School of Clinical Medicine, Xi’an Medical University, Xi’an, China
| | - Xiaoyan Fan
- Department of Experiment Surgery, Xijing Hospital, Fourth Military Medical University, Xi’an, China
| | - Jun Zhu
- Department of General Surgery, The Southern Theater Air Force Hospital, Guangzhou, China,*Correspondence: Jipeng Li, ; Jun Zhu,
| | - Jipeng Li
- Division of Digestive Surgery, Xijing Hospital of Digestive Diseases, Air force Medical University, Xi’an, China,State Key Laboratory of Cancer Biology, Institute of Digestive Diseases, Xijing Hospital, The Fourth Military Medical University, Xi’an, China,Department of Experiment Surgery, Xijing Hospital, Fourth Military Medical University, Xi’an, China,*Correspondence: Jipeng Li, ; Jun Zhu,
| |
Collapse
|
10
|
Machine learning for exploring neurophysiological functionality in multiple sclerosis based on trigeminal and hand blink reflexes. Sci Rep 2022; 12:21078. [PMID: 36473893 PMCID: PMC9726823 DOI: 10.1038/s41598-022-24720-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 11/18/2022] [Indexed: 12/12/2022] Open
Abstract
Brainstem dysfunctions are very common in Multiple Sclerosis (MS) and are a critical predictive factor for future disability. Brainstem functionality can be explored with blink reflexes, subcortical responses consisting in a blink following a peripheral stimulation. Some reflexes are already employed in clinical practice, such as Trigeminal Blink Reflex (TBR). Here we propose for the first time in MS the exploration of Hand Blink Reflex (HBR), which size is modulated by the proximity of the stimulated hand to the face, reflecting the extension of the peripersonal space. The aim of this work is to test whether Machine Learning (ML) techniques could be used in combination with neurophysiological measurements such as TBR and HBR to improve their clinical information and potentially favour the early detection of brainstem dysfunctionality. HBR and TBR were recorded from a group of People with MS (PwMS) with Relapsing-Remitting form and from a healthy control group. Two AdaBoost classifiers were trained with TBR and HBR features each, for a binary classification task between PwMS and Controls. Both classifiers were able to identify PwMS with an accuracy comparable and even higher than clinicians. Our results indicate that ML techniques could represent a tool for clinicians for investigating brainstem functionality in MS. Also, HBR could be promising when applied in clinical practice, providing additional information about the integrity of brainstem circuits potentially favouring early diagnosis.
Collapse
|
11
|
Classification of walnut varieties obtained from walnut leaf images by the recommended residual block based CNN model. Eur Food Res Technol 2022. [DOI: 10.1007/s00217-022-04168-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
12
|
Jia X, Ma Z, Kong D, Li Y, Hu H, Guan L, Yan J, Zhang R, Gu Y, Chen X, Shi L, Luo X, Li Q, Bai B, Ye X, Zhai H, Zhang H, Dong Y, Xu L, Zhou J. Novel Human Artificial Intelligence Hybrid Framework Pinpoints Thyroid Nodule Malignancy and Identifies Overlooked Second-Order Ultrasonographic Features. Cancers (Basel) 2022; 14:4440. [PMID: 36139599 PMCID: PMC9497166 DOI: 10.3390/cancers14184440] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Revised: 09/06/2022] [Accepted: 09/09/2022] [Indexed: 12/27/2022] Open
Abstract
We present a Human Artificial Intelligence Hybrid (HAIbrid) integrating framework that reweights Thyroid Imaging Reporting and Data System (TIRADS) features and the malignancy score predicted by a convolutional neural network (CNN) for nodule malignancy stratification and diagnosis. We defined extra ultrasonographical features from color Doppler images to explore malignancy-relevant features. We proposed Gated Attentional Factorization Machine (GAFM) to identify second-order interacting features trained via a 10 fold distribution-balanced stratified cross-validation scheme on ultrasound images of 3002 nodules all finally characterized by postoperative pathology (1270 malignant ones), retrospectively collected from 131 hospitals. Our GAFM-HAIbrid model demonstrated significant improvements in Area Under the Curve (AUC) value (p-value < 10−5), reaching about 0.92 over the standalone CNN (~0.87) and senior radiologists (~0.86), and identified a second-order vascularity localization and morphological pattern which was overlooked if only first-order features were considered. We validated the advantages of the integration framework on an already-trained commercial CNN system and our findings using an extra set of ultrasound images of 500 nodules. Our HAIbrid framework allows natural integration to clinical workflow for thyroid nodule malignancy risk stratification and diagnosis, and the proposed GAFM-HAIbrid model may help identify novel diagnosis-relevant second-order features beyond ultrasonography.
Collapse
Affiliation(s)
- Xiaohong Jia
- Department of Ultrasound, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai 200025, China
| | - Zehao Ma
- School of Mathematical Sciences, Zhejiang University, Hangzhou 310013, China
- Zhejiang Qiushi Institute for Mathematical Medicine, Hangzhou 311121, China
| | - Dexing Kong
- School of Mathematical Sciences, Zhejiang University, Hangzhou 310013, China
- Zhejiang Qiushi Institute for Mathematical Medicine, Hangzhou 311121, China
- College of Mathematical Medicine, Zhejiang Normal University, Jinhua 321004, China
| | - Yaming Li
- Department of Ultrasound, Puyang People’s Hospital, Puyang 457005, China
| | - Hairong Hu
- Demetics Medical Technology, Hangzhou 310012, China
| | - Ling Guan
- Department of Ultrasound, Gansu Provincial Cancer Hospital, Lanzhou 730050, China
| | - Jiping Yan
- Department of Ultrasound, Shanxi Provincial People’s Hospital, Taiyuan 030012, China
| | - Ruifang Zhang
- Department of Ultrasound, The First Affiliated Hospital, Zhengzhou University, Zhengzhou 450052, China
| | - Ying Gu
- Department of Ultrasound, Affiliated Hospital of Guizhou Medical University, Guiyang 550001, China
| | - Xia Chen
- Department of Ultrasound, Affiliated Hospital of Guizhou Medical University, Guiyang 550001, China
| | - Liying Shi
- Department of Ultrasound, Affiliated Hospital of Guizhou Medical University, Guiyang 550001, China
| | - Xiaomao Luo
- Department of Ultrasound, The Third Affiliated Hospital of Kunming Medical University, Yunnan Cancer Hospital, Kunming 650031, China
| | - Qiaoying Li
- Department of Ultrasound Diagnostics, Tangdu Hospital, Fourth Military Medical University, Xi’an 710038, China
| | - Baoyan Bai
- Department of Ultrasound, Affiliated Hospital of Yan’an University, School of Medicine, Yan’an University, Yan’an 716000, China
| | - Xinhua Ye
- Department of Ultrasound, The First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China
| | - Hong Zhai
- Department of Ultrasound, Traditional Chinese Medical Hospital of Xinjiang, Urumqi 830000, China
| | - Hua Zhang
- Department of Ultrasound, Anyang Tumor Hospital, The Fourth Affiliated Hospital of Henan University of Science and Technology, Anyang 455000, China
| | - Yijie Dong
- Department of Ultrasound, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai 200025, China
| | - Lei Xu
- Zhejiang Qiushi Institute for Mathematical Medicine, Hangzhou 311121, China
| | - Jianqiao Zhou
- Department of Ultrasound, Ruijin Hospital, Shanghai Jiaotong University School of Medicine, Shanghai 200025, China
| | | |
Collapse
|
13
|
Kim JH. Improvement of inceptionv3 model classification performance using chest X-ray images. J MECH MED BIOL 2022. [DOI: 10.1142/s0219519422400322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
14
|
Hu J, Wang J, Li J, Hu H, Wu B, Ren H, Wang J. AHLS-pred: a novel sequence-based predictor of acyl-homoserine-lactone synthases using machine learning algorithms. ENVIRONMENTAL MICROBIOLOGY REPORTS 2022; 14:616-631. [PMID: 35403334 DOI: 10.1111/1758-2229.13068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Revised: 03/28/2022] [Accepted: 03/30/2022] [Indexed: 06/14/2023]
Abstract
Acyl-homoserine-lactones (AHLs), as the major quorum sensing (QS) signalling molecules in Gram-negative bacteria, have shown great application potential in regulating biological nutrient removal process. The identification of AHLs synthases plays an essential role in in-depth research on QS mechanisms and applications of biological wastewater treatment processes. This work proposed the first prediction model for AHLs synthases based on machine learning algorithms, namely, AHLS-pred. The training dataset AHLS1400 and the independent testing dataset AHLS132 for AHLSs prediction were first established. Three sequence-based feature extraction methods are utilized to generate feature descriptors, namely, amino acid composition, dipeptide composition and G-gap dipeptide composition respectively. Subsequently, the optimal features were obtained based on the sorted feature descriptors (in F-score order) and the sequential forward search strategy. By comparing five different machine learning algorithms, the final prediction model is trained with support vector machine classifier on AHLS1400 in fivefold cross-validation with the best performance (ACC = 99.43%, MCC = 0.989, AUC = 0.997). The results show that AHLS-pred achieves an ACC of 94.70%, MCC of 0.894 and AUC of 0.995 on the independent testing dataset AHLS132. It demonstrates that AHLS-pred is a promising and powerful prediction method for accelerating the process of AHLSs computational identification.
Collapse
Affiliation(s)
- Jie Hu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Jin Wang
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Jiahao Li
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Haidong Hu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Bin Wu
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Hongqiang Ren
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| | - Jinfeng Wang
- State Key Laboratory of Pollution Control and Resource Reuse, School of the Environment, Nanjing University, Nanjing, Jiangsu, 210023, China
| |
Collapse
|
15
|
Zhang ZL, Zhang CY, Luo XG, Zhou Q. A multiple classifiers system with roulette-based feature subspace selection for one-vs-one scheme. Pattern Anal Appl 2022. [DOI: 10.1007/s10044-022-01089-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
16
|
Morgan-Benita JA, Galván-Tejada CE, Cruz M, Galván-Tejada JI, Gamboa-Rosales H, Arceo-Olague JG, Luna-García H, Celaya-Padilla JM. Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features. Healthcare (Basel) 2022; 10:healthcare10081362. [PMID: 35893185 PMCID: PMC9331873 DOI: 10.3390/healthcare10081362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/11/2022] [Accepted: 07/15/2022] [Indexed: 11/16/2022] Open
Abstract
Type 2 diabetes mellitus (T2DM) represents one of the biggest health problems in Mexico, and it is extremely important to early detect this disease and its complications. For a noninvasive detection of T2DM, a machine learning (ML) approach that uses ensemble classification models with dichotomous output that is also fast and effective for early detection and prediction of T2D can be used. In this article, an ensemble technique by hard voting is designed and implemented using generalized linear regression (GLM), support vector machines (SVM) and artificial neural networks (ANN) for the classification of T2DM patients. In the materials and methods as a first step, the data is balanced, standardized, imputed and integrated into the three models to classify the patients in a dichotomous result. For the selection of features, an implementation of LASSO is developed, with a 10-fold cross-validation and for the final validation, the Area Under the Curve (AUC) is used. The results in LASSO showed 12 features, which are used in the implemented models to obtain the best possible scenario in the developed ensemble model. The algorithm with the best performance of the three is SVM, this model obtained an AUC of 92% ± 3%. The ensemble model built with GLM, SVM and ANN obtained an AUC of 90% ± 3%.
Collapse
Affiliation(s)
- Jorge A. Morgan-Benita
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Carlos E. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Miguel Cruz
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Av. Cuauhtémoc 330, Col. Doctores, Del. Cuauhtémoc, Mexico City 06720, Mexico;
| | - Jorge I. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Hamurabi Gamboa-Rosales
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Jose G. Arceo-Olague
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Huizilopoztli Luna-García
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
- Correspondence: (H.L.-G.); (J.M.C.-P.)
| | - José M. Celaya-Padilla
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
- Correspondence: (H.L.-G.); (J.M.C.-P.)
| |
Collapse
|
17
|
Impact of Regressand Stratification in Dataset Shift Caused by Cross-Validation. MATHEMATICS 2022. [DOI: 10.3390/math10142538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Data that have not been modeled cannot be correctly predicted. Under this assumption, this research studies how k-fold cross-validation can introduce dataset shift in regression problems. This fact implies data distributions in the training and test sets to be different and, therefore, a deterioration of the model performance estimation. Even though the stratification of the output variable is widely used in the field of classification to reduce the impacts of dataset shift induced by cross-validation, its use in regression is not widespread in the literature. This paper analyzes the consequences for dataset shift of including different regressand stratification schemes in cross-validation with regression data. The results obtained show that these allow for creating more similar training and test sets, reducing the presence of dataset shift related to cross-validation. The bias and deviation of the performance estimation results obtained by regression algorithms are improved using the highest amounts of strata, as are the number of cross-validation repetitions necessary to obtain these better results.
Collapse
|
18
|
Wu L, Hu Y, Zhang X, Yuan B, Chen W, Liu K, Liu M. Temporal dynamics of clinical risk predictors for hospital-acquired acute kidney injury under different forecast time windows. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108655] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
|
19
|
Yang Y, Cai J, Yang H, Zhao X. Density clustering with divergence distance and automatic center selection. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2022.03.027] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
20
|
FT4cip: A new functional tree for classification in class imbalance problems. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.109294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
21
|
Random vector functional link network with subspace-based local connections. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03404-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
|
22
|
Kyaw TY, Siegert CM, Dash P, Poudel KP, Pitts JJ, Renninger HJ. Using hyperspectral leaf reflectance to estimate photosynthetic capacity and nitrogen content across eastern cottonwood and hybrid poplar taxa. PLoS One 2022; 17:e0264780. [PMID: 35271605 PMCID: PMC8912144 DOI: 10.1371/journal.pone.0264780] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Accepted: 02/16/2022] [Indexed: 02/05/2023] Open
Abstract
Eastern cottonwood (Populus deltoides W. Bartram ex Marshall) and hybrid poplars are well-known bioenergy crops. With advances in tree breeding, it is increasingly necessary to find economical ways to identify high-performing Populus genotypes that can be planted under different environmental conditions. Photosynthesis and leaf nitrogen content are critical parameters for plant growth, however, measuring them is an expensive and time-consuming process. Instead, these parameters can be quickly estimated from hyperspectral leaf reflectance if robust statistical models can be developed. To this end, we measured photosynthetic capacity parameters (Rubisco-limited carboxylation rate (Vcmax), electron transport-limited carboxylation rate (Jmax), and triose phosphate utilization-limited carboxylation rate (TPU)), nitrogen per unit leaf area (Narea), and leaf reflectance of seven taxa and 62 genotypes of Populus from two study plantations in Mississippi. For statistical modeling, we used least absolute shrinkage and selection operator (LASSO) and principal component analysis (PCA). Our results showed that the predictive ability of LASSO and PCA models was comparable, except for Narea in which LASSO was superior. In terms of model interpretability, LASSO outperformed PCA because the LASSO models needed 2 to 4 spectral reflectance wavelengths to estimate parameters. The LASSO models used reflectance values at 758 and 935 nm for estimating Vcmax (R2 = 0.51 and RMSPE = 31%) and Jmax (R2 = 0.54 and RMSPE = 32%); 687, 746, and 757 nm for estimating TPU (R2 = 0.56 and RMSPE = 31%); and 304, 712, 921, and 1021 nm for estimating Narea (R2 = 0.29 and RMSPE = 21%). The PCA model also identified 935 nm as a significant wavelength for estimating Vcmax and Jmax. Therefore, our results suggest that hyperspectral leaf reflectance modeling can be used as a cost-effective means for field phenotyping and rapid screening of Populus genotypes because of its capacity to estimate these physicochemical parameters.
Collapse
Affiliation(s)
- Thu Ya Kyaw
- Department of Forestry, Forest and Wildlife Research Center, Mississippi State University, Starkville, Mississippi, United States of America
- * E-mail:
| | - Courtney M. Siegert
- Department of Forestry, Forest and Wildlife Research Center, Mississippi State University, Starkville, Mississippi, United States of America
| | - Padmanava Dash
- Department of Geosciences, Mississippi State University, Starkville, Mississippi, United States of America
| | - Krishna P. Poudel
- Department of Forestry, Forest and Wildlife Research Center, Mississippi State University, Starkville, Mississippi, United States of America
| | - Justin J. Pitts
- Department of Forestry, Forest and Wildlife Research Center, Mississippi State University, Starkville, Mississippi, United States of America
| | - Heidi J. Renninger
- Department of Forestry, Forest and Wildlife Research Center, Mississippi State University, Starkville, Mississippi, United States of America
| |
Collapse
|
23
|
An approach of classifiers fusion based on hierarchical modifications. APPL INTELL 2022. [DOI: 10.1007/s10489-021-02777-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
24
|
Micro-Motion Classification of Flying Bird and Rotor Drones via Data Augmentation and Modified Multi-Scale CNN. REMOTE SENSING 2022. [DOI: 10.3390/rs14051107] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Aiming at the difficult problem of the classification between flying bird and rotary-wing drone by radar, a micro-motion feature classification method is proposed in this paper. Using K-band frequency modulated continuous wave (FMCW) radar, data acquisition of five types of rotor drones (SJRC S70 W, DJI Mavic Air 2, DJI Inspire 2, hexacopter, and single-propeller fixed-wing drone) and flying birds is carried out under indoor and outdoor scenes. Then, the feature extraction and parameterization of the corresponding micro-Doppler (m-D) signal are performed using time-frequency (T-F) analysis. In order to increase the number of effective datasets and enhance m-D features, the data augmentation method is designed by setting the amplitude scope displayed in T-F graph and adopting feature fusion of the range-time (modulation periods) graph and T-F graph. A multi-scale convolutional neural network (CNN) is employed and modified, which can extract both the global and local information of the target’s m-D features and reduce the parameter calculation burden. Validation with the measured dataset of different targets using FMCW radar shows that the average correct classification accuracy of drones and flying birds for short and long range experiments of the proposed algorithm is 9.4% and 4.6% higher than the Alexnet- and VGG16-based CNN methods, respectively.
Collapse
|
25
|
Abhishek A, Jha RK, Sinha R, Jha K. Automated classification of acute leukemia on a heterogeneous dataset using machine learning and deep learning techniques. Biomed Signal Process Control 2022. [DOI: 10.1016/j.bspc.2021.103341] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
26
|
Smooth Group L1/2 Regularization for Pruning Convolutional Neural Networks. Symmetry (Basel) 2022. [DOI: 10.3390/sym14010154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
In this paper, a novel smooth group L1/2 (SGL1/2) regularization method is proposed for pruning hidden nodes of the fully connected layer in convolution neural networks. Usually, the selection of nodes and weights is based on experience, and the convolution filter is symmetric in the convolution neural network. The main contribution of SGL1/2 is to try to approximate the weights to 0 at the group level. Therefore, we will be able to prune the hidden node if the corresponding weights are all close to 0. Furthermore, the feasibility analysis of this new method is carried out under some reasonable assumptions due to the smooth function. The numerical results demonstrate the superiority of the SGL1/2 method with respect to sparsity, without damaging the classification performance.
Collapse
|
27
|
Wu L, Hu Y, Zhang X, Zhang J, Liu M. Development of a knowledge mining approach to uncover heterogeneous risk predictors of acute kidney injury across age groups. Int J Med Inform 2021; 158:104661. [PMID: 34915319 PMCID: PMC9177901 DOI: 10.1016/j.ijmedinf.2021.104661] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2021] [Revised: 10/21/2021] [Accepted: 12/05/2021] [Indexed: 10/19/2022]
Abstract
OBJECTIVES Acute kidney injury (AKI) risk increases with age and the underlying clinical predictors may be heterogeneous across age strata. This study aims to uncover the AKI risk factor heterogeneity among general inpatients across age groups using electronic medical records (EMR). METHODS Patient data (n = 179,370 encounters) were collected from an academic hospital between 2007 and 2016, and were stratified into four age groups: 18-35, 36-55, 56-65, and > 65. Potential risk factors extracted for the cohort included demographics, vital signs, laboratory values, past medical diagnoses, medications and admission diagnoses. We developed a data driven knowledge mining approach consisting of a machine learning algorithm to identify AKI predictors across age strata and a statistical method to quantify the impact of those factors on AKI risk. Identified predictors were evaluated for their predictability of AKI in terms of area-under-the-receiver-operating-characteristic-curve (AUC) and validated against expert knowledge. RESULTS Among the final analysis cohort of 76,957 hospital admissions, AKI prediction across age groups 18-35 (16.73%), 36-55 (32.74%), 56-65 (23.52%), and > 65 years (27.01%) achieved AUC of 0.85 (95% CI, 0.80-0.88), 0.86 (95% CI, 0.83-0.89), 0.87 (95% CI, 0.86-0.90), and 0.87 (95% CI, 0.86-0.90), respectively. Compared to expert knowledge, absolute consistency rates of the top-150 identified risk factors for each group were 78.4%, 77.2%, 81.3%, and 79.5%, respectively. Impact of many predictors on AKI varied across age groups; for example, high body mass index (BMI) was found to be associated with higher AKI risk in elderly patients, but low BMI was found to be associated with higher AKI risk in younger patients. CONCLUSIONS We verified the effectiveness of the knowledge mining method from the perspectives of accuracy, stability and credibility, and used this approach to clarify the heterogeneity of AKI risk factors between age groups. Future decision support systems need to consider such heterogeneity to enhance personalized patient care.
Collapse
Affiliation(s)
- Lijuan Wu
- Big Data Decision Institute, Jinan University, Guangzhou 510632, China; Guangdong Engineering Technology Research Center for Big Data Precision Healthcare, Guangzhou 510632, China.
| | - Yong Hu
- Big Data Decision Institute, Jinan University, Guangzhou 510632, China; Guangdong Engineering Technology Research Center for Big Data Precision Healthcare, Guangzhou 510632, China.
| | - Xiangzhou Zhang
- Big Data Decision Institute, Jinan University, Guangzhou 510632, China; Guangdong Engineering Technology Research Center for Big Data Precision Healthcare, Guangzhou 510632, China
| | - Jia Zhang
- The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Mei Liu
- Department of Internal Medicine, Division of Medical Informatics, University of Kansas Medical Center, Kansas City 66160, USA.
| |
Collapse
|
28
|
An Overview of Supervised Machine Learning Methods and Data Analysis for COVID-19 Detection. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:4733167. [PMID: 34853669 PMCID: PMC8629644 DOI: 10.1155/2021/4733167] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 08/16/2021] [Accepted: 10/11/2021] [Indexed: 12/16/2022]
Abstract
Methods Our analysis and machine learning algorithm is based on most cited two clinical datasets from the literature: one from San Raffaele Hospital Milan Italia and the other from Hospital Israelita Albert Einstein São Paulo Brasilia. The datasets were processed to select the best features that most influence the target, and it turned out that almost all of them are blood parameters. EDA (Exploratory Data Analysis) methods were applied to the datasets, and a comparative study of supervised machine learning models was done, after which the support vector machine (SVM) was selected as the one with the best performance. Results SVM being the best performant is used as our proposed supervised machine learning algorithm. An accuracy of 99.29%, sensitivity of 92.79%, and specificity of 100% were obtained with the dataset from Kaggle (https://www.kaggle.com/einsteindata4u/covid19) after applying optimization to SVM. The same procedure and work were performed with the dataset taken from San Raffaele Hospital (https://zenodo.org/record/3886927#.YIluB5AzbMV). Once more, the SVM presented the best performance among other machine learning algorithms, and 92.86%, 93.55%, and 90.91% for accuracy, sensitivity, and specificity, respectively, were obtained. Conclusion The obtained results, when compared with others from the literature based on these same datasets, are superior, leading us to conclude that our proposed solution is reliable for the COVID-19 diagnosis.
Collapse
|
29
|
Wu J, Ding W, Ye X, Wei Q, Lv X, Tang Q, Tian Y, Wang K, Jiang Y. Interictal Activity Is Associated With Slower Binocular Rivalry in Idiopathic Generalized Epilepsy. Front Neurol 2021; 12:720126. [PMID: 34867711 PMCID: PMC8634877 DOI: 10.3389/fneur.2021.720126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 10/19/2021] [Indexed: 11/13/2022] Open
Abstract
Objective: Perceptual alternations evoked by binocular rivalry (BR) reflect cortical dynamics strongly dependent on the excitatory-inhibitory balance, suggesting potential utility as a biomarker for epileptogenesis. Therefore, we investigated the characteristics of BR in patients with idiopathic generalized epilepsy (IGE) and potential associations with clinical variables. Methods: Sixty-two healthy controls (HCs) and 94 IGE patients completed BR task. Perceptual alternation rates were compared between HC and IGE groups as well as among the HC group and IGE patients stratified according to the presence or absence of interictal activity on the ambulatory electroencephalogram (EEG), termed the abnormal ambulatory EEG group (AB-AEEG, n = 64) and normal ambulatory EEG group (N-AEEG, n = 30), respectively. Results: The IGE patients demonstrated a slower rate of BR perceptual alternation than HC subjects (t = -4.364, p < 0.001). The alternation rate also differed among the HC, AB-AEEG, and N-AEEG groups (F = 44.962, df = 2, p < 0.001), and post hoc comparisons indicated a significantly slower alternation rate in the AB-AEEG group compared with the N-AEEG and HC groups (0.28 vs. 0.46, and 0.43 Hz). Stepwise linear regression revealed positive correlations between the BR alternation rate and both the ambulatory EEG status (β, 0.173; standard error, 0.022 p < 0.001) and Montreal Cognitive Assessment score (β, 0.013; standard error, 0.004; p = 0.003). Receiver operating characteristic curve analysis of the BR alternation rate distinguished AB-AEEG from N-AEEG subjects with 90.00% sensitivity and 76.90% specificity (area under the curve = 0.881; 95% confidence interval = 0.801- 0.961, cut-off = 0.319). Alternatively, Montreal Cognitive Assessment score did not accurately distinguish AB-AEEG from N-AEEG subjects and the area under the receiver operating characteristic curve combining the BR alternation rate and Montreal Cognitive Assessment score was not markedly larger than that of the BR alternation rate alone (0.894, 95% confidence interval = 0.822-0.966, p < 0.001). K-fold cross-validation was used to evaluate the predictive performance of BR alternation rate, MoCA score, and the combination of both, which yielded average AUC values of 0.870, 0.584 and 0.847, average sensitivity values of 89.36, 92.73, and 91.28%, and average specificity values of 62.25, 13.42, and 61.78%, respectively. The number of interictal epileptiform discharges was significantly correlated with the alternation rate in IGE patients (r = 0.296, p = 0.018). A forward stepwise linear regression model identified the number of interictal epileptiform discharges (β, 0.001; standard error, 0.001; p = 0.025) as an independent factor associated with BR alternation rate in these patients. Conclusion: These results suggest that interictal epileptiform discharges are associated with disruptions in perceptual awareness, and that the BR may be a useful auxiliary behavioral task to diagnosis and dynamically monitor IGE patients with interictal discharge.
Collapse
Affiliation(s)
- Jiaonan Wu
- Department of Neurology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Department of Neurology, Anhui Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Wei Ding
- Department of Nephrology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Xing Ye
- Department of Neurology, Nanjing Brain Hospital, Nanjing Medical University, Nanjing, China
- Department of Neurology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
| | - Qiang Wei
- Department of Neurology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Key Laboratory of Cognition and Neuropsychiatric Disorders, Hefei, China
| | - Xinyi Lv
- Department of Neurology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
| | - Qiqiang Tang
- Department of Neurology, The First Affiliated Hospital of USTC, Division of Life Sciences and Medicine, University of Science and Technology of China, Hefei, China
- Department of Neurology, Anhui Provincial Hospital, Cheeloo College of Medicine, Shandong University, Jinan, China
| | - Yanghua Tian
- Department of Neurology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Key Laboratory of Cognition and Neuropsychiatric Disorders, Hefei, China
| | - Kai Wang
- Department of Neurology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Key Laboratory of Cognition and Neuropsychiatric Disorders, Hefei, China
| | - Yubao Jiang
- Department of Neurology, The First Affiliated Hospital of Anhui Medical University, Hefei, China
- Key Laboratory of Cognition and Neuropsychiatric Disorders, Hefei, China
| |
Collapse
|
30
|
Fan S, Zhang X, Song Z. Reinforced knowledge distillation: Multi-class imbalanced classifier based on policy gradient reinforcement learning. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2021.08.040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
31
|
Ghafoor M, Tariq SA, Zia T, Taj IA, Abbas A, Hassan A, Zomaya AY. Fingerprint Identification With Shallow Multifeature View Classifier. IEEE TRANSACTIONS ON CYBERNETICS 2021; 51:4515-4527. [PMID: 31880579 DOI: 10.1109/tcyb.2019.2957188] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
This article presents an efficient fingerprint identification system that implements an initial classification for search-space reduction followed by minutiae neighbor-based feature encoding and matching. The current state-of-the-art fingerprint classification methods use a deep convolutional neural network (DCNN) to assign confidence for the classification prediction, and based on this prediction, the input fingerprint is matched with only the subset of the database that belongs to the predicted class. It can be observed for the DCNNs that as the architectures deepen, the farthest layers of the network learn more abstract information from the input images that result in higher prediction accuracies. However, the downside is that the DCNNs are data hungry and require lots of annotated (labeled) data to learn generalized network parameters for deeper layers. In this article, a shallow multifeature view CNN (SMV-CNN) fingerprint classifier is proposed that extracts: 1) fine-grained features from the input image and 2) abstract features from explicitly derived representations obtained from the input image. The multifeature views are fed to a fully connected neural network (NN) to compute a global classification prediction. The classification results show that the SMV-CNN demonstrated an improvement of 2.8% when compared to baseline CNN consisting of a single grayscale view on an open-source database. Moreover, in comparison with the state-of-the-art residual network (ResNet-50) image classification model, the proposed method performs comparably while being less complex and more efficient during training. The result of classification-based fingerprint identification has shown that the search space is reduced by over 50% without degradation of identification accuracies.
Collapse
|
32
|
Maldonado S, López J, Iturriaga A. Out-of-time cross-validation strategies for classification in the presence of dataset shift. APPL INTELL 2021. [DOI: 10.1007/s10489-021-02735-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
33
|
Identification of CNGB1 as a Predictor of Response to Neoadjuvant Chemotherapy in Muscle-Invasive Bladder Cancer. Cancers (Basel) 2021; 13:cancers13153903. [PMID: 34359804 PMCID: PMC8345622 DOI: 10.3390/cancers13153903] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 07/27/2021] [Indexed: 01/12/2023] Open
Abstract
Simple Summary Chemotherapy is recommended prior to surgical removal of the bladder for muscle-invasive bladder cancer patients. Despite a survival benefit, some patients do not respond and experience substantial toxicity and delay in surgery. Therefore, the identification of chemotherapy responders before initiating therapy would be a helpful clinical asset. To date, there are no reliable biomarkers routinely used in clinical practice that identify patients most likely to benefit from chemotherapy and their identification is urgently required for more precise delivery of care. To address this issue, we compared gene expression profiles of biopsy materials from 30 chemotherapy-responder and -non-responder patients. This analysis revealed a novel signature gene set and CNGB1 as a simpler proxy as a promising biomarker to predict chemoresponsiveness of muscle-invasive bladder cancer patients. Our findings require further validation in larger patient cohorts and in a clinical trial setting. Abstract Cisplatin-based neoadjuvant chemotherapy (NAC) is recommended prior to radical cystectomy for muscle-invasive bladder cancer (MIBC) patients. Despite a 5–10% survival benefit, some patients do not respond and experience substantial toxicity and delay in surgery. To date, there are no clinically approved biomarkers predictive of response to NAC and their identification is urgently required for more precise delivery of care. To address this issue, a multi-methods analysis approach of machine learning and differential gene expression analysis was undertaken on a cohort of 30 MIBC cases highly selected for an exquisitely strong response to NAC or marked resistance and/or progression (discovery cohort). RGIFE (ranked guided iterative feature elimination) machine learning algorithm, previously demonstrated to have the ability to select biomarkers with high predictive power, identified a 9-gene signature (CNGB1, GGH, HIST1H4F, IDO1, KIF5A, MRPL4, NCDN, PRRT3, SLC35B3) able to select responders from non-responders with 100% predictive accuracy. This novel signature correlated with overall survival in meta-analysis performed using published NAC treated-MIBC microarray data (validation cohort 1, n = 26, Log rank test, p = 0.02). Corroboration with differential gene expression analysis revealed cyclic nucleotide-gated channel, CNGB1, as the top ranked upregulated gene in non-responders to NAC. A higher CNGB1 immunostaining score was seen in non-responders in tissue microarray analysis of the discovery cohort (n = 30, p = 0.02). Kaplan-Meier analysis of a further cohort of MIBC patients (validation cohort 2, n = 99) demonstrated that a high level of CNGB1 expression associated with shorter cancer specific survival (p < 0.001). Finally, in vitro studies showed siRNA-mediated CNGB1 knockdown enhanced cisplatin sensitivity of MIBC cell lines, J82 and 253JB-V. Overall, these data reveal a novel signature gene set and CNGB1 as a simpler proxy as a promising biomarker to predict chemoresponsiveness of MIBC patients.
Collapse
|
34
|
|
35
|
Sánchez-Reyna AG, Celaya-Padilla JM, Galván-Tejada CE, Luna-García H, Gamboa-Rosales H, Ramirez-Morales A, Galván-Tejada JI. Multimodal Early Alzheimer's Detection, a Genetic Algorithm Approach with Support Vector Machines. Healthcare (Basel) 2021; 9:971. [PMID: 34442108 PMCID: PMC8391811 DOI: 10.3390/healthcare9080971] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 07/26/2021] [Indexed: 11/16/2022] Open
Abstract
Alzheimer's disease (AD) is a neurodegenerative disease that mainly affects older adults. Currently, AD is associated with certain hypometabolic biomarkers, beta-amyloid peptides, hyperphosphorylated tau protein, and changes in brain morphology. Accurate diagnosis of AD, as well as mild cognitive impairment (MCI) (prodromal stage of AD), is essential for early care of the disease. As a result, machine learning techniques have been used in recent years for the diagnosis of AD. In this research, we propose a novel methodology to generate a multivariate model that combines different types of features for the detection of AD. In order to obtain a robust biomarker, ADNI baseline data, clinical and neuropsychological assessments (1024 features) of 106 patients were used. The data were normalized, and a genetic algorithm was implemented for the selection of the most significant features. Subsequently, for the development and validation of the multivariate classification model, a support vector machine model was created, and a five-fold cross-validation with an AUC of 87.63% was used to measure model performance. Lastly, an independent blind test of our final model, using 20 patients not considered during the model construction, yielded an AUC of 100%.
Collapse
Affiliation(s)
- Ana G. Sánchez-Reyna
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro Historico, Zacatecas 98000, Mexico; (A.G.S.-R.); (J.M.C.-P.); (C.E.G.-T.); (H.L.-G.); (H.G.-R.)
| | - José M. Celaya-Padilla
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro Historico, Zacatecas 98000, Mexico; (A.G.S.-R.); (J.M.C.-P.); (C.E.G.-T.); (H.L.-G.); (H.G.-R.)
| | - Carlos E. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro Historico, Zacatecas 98000, Mexico; (A.G.S.-R.); (J.M.C.-P.); (C.E.G.-T.); (H.L.-G.); (H.G.-R.)
| | - Huizilopoztli Luna-García
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro Historico, Zacatecas 98000, Mexico; (A.G.S.-R.); (J.M.C.-P.); (C.E.G.-T.); (H.L.-G.); (H.G.-R.)
| | - Hamurabi Gamboa-Rosales
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro Historico, Zacatecas 98000, Mexico; (A.G.S.-R.); (J.M.C.-P.); (C.E.G.-T.); (H.L.-G.); (H.G.-R.)
| | | | - Jorge I. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro Historico, Zacatecas 98000, Mexico; (A.G.S.-R.); (J.M.C.-P.); (C.E.G.-T.); (H.L.-G.); (H.G.-R.)
| | | |
Collapse
|
36
|
Liu L, Chen X, Petinrin OO, Zhang W, Rahaman S, Tang ZR, Wong KC. Machine Learning Protocols in Early Cancer Detection Based on Liquid Biopsy: A Survey. Life (Basel) 2021; 11:638. [PMID: 34209249 PMCID: PMC8308091 DOI: 10.3390/life11070638] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 06/23/2021] [Accepted: 06/24/2021] [Indexed: 12/24/2022] Open
Abstract
With the advances of liquid biopsy technology, there is increasing evidence that body fluid such as blood, urine, and saliva could harbor the potential biomarkers associated with tumor origin. Traditional correlation analysis methods are no longer sufficient to capture the high-resolution complex relationships between biomarkers and cancer subtype heterogeneity. To address the challenge, researchers proposed machine learning techniques with liquid biopsy data to explore the essence of tumor origin together. In this survey, we review the machine learning protocols and provide corresponding code demos for the approaches mentioned. We discuss algorithmic principles and frameworks extensively developed to reveal cancer mechanisms and consider the future prospects in biomarker exploration and cancer diagnostics.
Collapse
Affiliation(s)
- Linjing Liu
- Department of Computer Science, City University of Hong Kong, Hong Kong, China; (L.L.); (X.C.); (O.O.P.); (W.Z.); (S.R.); (Z.-R.T.)
| | - Xingjian Chen
- Department of Computer Science, City University of Hong Kong, Hong Kong, China; (L.L.); (X.C.); (O.O.P.); (W.Z.); (S.R.); (Z.-R.T.)
| | - Olutomilayo Olayemi Petinrin
- Department of Computer Science, City University of Hong Kong, Hong Kong, China; (L.L.); (X.C.); (O.O.P.); (W.Z.); (S.R.); (Z.-R.T.)
| | - Weitong Zhang
- Department of Computer Science, City University of Hong Kong, Hong Kong, China; (L.L.); (X.C.); (O.O.P.); (W.Z.); (S.R.); (Z.-R.T.)
| | - Saifur Rahaman
- Department of Computer Science, City University of Hong Kong, Hong Kong, China; (L.L.); (X.C.); (O.O.P.); (W.Z.); (S.R.); (Z.-R.T.)
| | - Zhi-Ri Tang
- Department of Computer Science, City University of Hong Kong, Hong Kong, China; (L.L.); (X.C.); (O.O.P.); (W.Z.); (S.R.); (Z.-R.T.)
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong, China; (L.L.); (X.C.); (O.O.P.); (W.Z.); (S.R.); (Z.-R.T.)
- Hong Kong Institute for Data Science, City University of Hong Kong, Hong Kong, China
| |
Collapse
|
37
|
Wang S, Deng L, Xia X, Cao Z, Fei Y. Predicting antifreeze proteins with weighted generalized dipeptide composition and multi-regression feature selection ensemble. BMC Bioinformatics 2021; 22:340. [PMID: 34162327 PMCID: PMC8220696 DOI: 10.1186/s12859-021-04251-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 06/09/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Antifreeze proteins (AFPs) are a group of proteins that inhibit body fluids from growing to ice crystals and thus improve biological antifreeze ability. It is vital to the survival of living organisms in extremely cold environments. However, little research is performed on sequences feature extraction and selection for antifreeze proteins classification in the structure and function prediction, which is of great significance. RESULTS In this paper, to predict the antifreeze proteins, a feature representation of weighted generalized dipeptide composition (W-GDipC) and an ensemble feature selection based on two-stage and multi-regression method (LRMR-Ri) are proposed. Specifically, four feature selection algorithms: Lasso regression, Ridge regression, Maximal information coefficient and Relief are used to select the feature sets, respectively, which is the first stage of LRMR-Ri method. If there exists a common feature subset among the above four sets, it is the optimal subset; otherwise we use Ridge regression to select the optimal subset from the public set pooled by the four sets, which is the second stage of LRMR-Ri. The LRMR-Ri method combined with W-GDipC was performed both on the antifreeze proteins dataset (binary classification), and on the membrane protein dataset (multiple classification). Experimental results show that this method has good performance in support vector machine (SVM), decision tree (DT) and stochastic gradient descent (SGD). The values of ACC, RE and MCC of LRMR-Ri and W-GDipC with antifreeze proteins dataset and SVM classifier have reached as high as 95.56%, 97.06% and 0.9105, respectively, much higher than those of each single method: Lasso, Ridge, Mic and Relief, nearly 13% higher than single Lasso for ACC. CONCLUSION The experimental results show that the proposed LRMR-Ri and W-GDipC method can significantly improve the accuracy of antifreeze proteins prediction compared with other similar single feature methods. In addition, our method has also achieved good results in the classification and prediction of membrane proteins, which verifies its widely reliability to a certain extent.
Collapse
Affiliation(s)
- Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, China.
| | - Lin Deng
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, China
| | - Xinnan Xia
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, China.
| | - Zicheng Cao
- School of Public Health (Shenzhen), Sun Yat-Sen University, Guangzhou, 510006, China
| | - Yu Fei
- School of Statistics and Mathematics, Yunnan University of Finance and Economics, Kunming, 650221, China.
| |
Collapse
|
38
|
Yang LH, Liu J, Wang YM, Wang H, Martínez L. Enhancing extended belief rule-based systems for classification problems using decomposition strategy and overlap function. INT J MACH LEARN CYB 2021. [DOI: 10.1007/s13042-021-01355-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
39
|
Stapor K, Ksieniewicz P, García S, Woźniak M. How to design the fair experimental classifier evaluation. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107219] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
40
|
|
41
|
Schneider N, Sohrabi K, Schneider H, Zimmer KP, Fischer P, de Laffolie J. Machine Learning Classification of Inflammatory Bowel Disease in Children Based on a Large Real-World Pediatric Cohort CEDATA-GPGE® Registry. Front Med (Lausanne) 2021; 8:666190. [PMID: 34109197 PMCID: PMC8180568 DOI: 10.3389/fmed.2021.666190] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 04/26/2021] [Indexed: 01/31/2023] Open
Abstract
Introduction: The rising incidence of pediatric inflammatory bowel diseases (PIBD) facilitates the need for new methods of improving diagnosis latency, quality of care and documentation. Machine learning models have shown to be applicable to classifying PIBD when using histological data or extensive serology. This study aims to evaluate the performance of algorithms based on promptly available data more suited to clinical applications. Methods: Data of inflammatory locations of the bowels from initial and follow-up visitations is extracted from the CEDATA-GPGE registry and two follow-up sets are split off containing only input from 2017 and 2018. Pre-processing excludes patients in remission and encodes the categorical data numerically. For classification of PIBD diagnosis, a support vector machine (SVM), a random forest algorithm (RF), extreme gradient boosting (XGBoost), a dense neural network (DNN) and a convolutional neural network (CNN) are employed. As best performer, a convolutional neural network is further improved using grid optimization. Results: The achieved accuracy of the optimized neural network reaches up to 90.57% on data inserted into the registry in 2018. Less performant methods reach 88.78% for the DNN down to 83.94% for the XGBoost. The accuracy of prediction for the 2018 follow-up dataset is higher than those for older datasets. Neural networks yield a higher standard deviation with 3.45 for the CNN compared to 0.83–0.86 of the support vector machine and ensemble methods. Discussion: The displayed accuracy of the convolutional neural network proofs the viability of machine learning classification in PIBD diagnostics using only timely available data.
Collapse
Affiliation(s)
- Nicolas Schneider
- Institute of Medical Informatics, Justus-Liebig-University Giessen, Gießen, Germany
| | - Keywan Sohrabi
- Faculty of Health, Technical University of Applied Sciences Mittelhessen, Gießen, Germany
| | - Henning Schneider
- Institute of Medical Informatics, Justus-Liebig-University Giessen, Gießen, Germany
| | - Klaus-Peter Zimmer
- Department of Pediatrics, Justus-Liebig-University Giessen, Gießen, Germany
| | - Patrick Fischer
- Institute of Medical Informatics, Justus-Liebig-University Giessen, Gießen, Germany
| | - Jan de Laffolie
- Department of Pediatrics, Justus-Liebig-University Giessen, Gießen, Germany
| | | |
Collapse
|
42
|
Impact of Minutiae Errors in Latent Fingerprint Identification: Assessment and Prediction. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11094187] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We study the impact of minutiae errors in the performance of latent fingerprint identification systems. We perform several experiments in which we remove ground-truth minutiae from latent fingerprints and evaluate the effects on matching score and rank-n identification using two different matchers and the popular NIST SD27 dataset. We observe how missing even one minutia from a fingerprint can have a significant negative impact on the identification performance. Our experimental results show that a fingerprint which has a top rank can be demoted to a bottom rank when two or more minutiae are missed. From our experimental results, we have noticed that some minutiae are more critical than others to correctly identify a latent fingerprint. Based on this finding, we have created a dataset to train several machine learning models trying to predict the impact of each minutia in the matching score of a fingerprint identification system. Finally, our best-trained model can successfully predict if a minutia will increase or decrease the matching score of a latent fingerprint.
Collapse
|
43
|
Villa-Pérez ME, Álvarez-Carmona MÁ, Loyola-González O, Medina-Pérez MA, Velazco-Rossell JC, Choo KKR. Semi-supervised anomaly detection algorithms: A comparative summary and future research directions. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2021.106878] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
44
|
Abstract
Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation algorithm is given. The experiments show that the proposed methods compare favorably to the state-of-the-art by improving clustering accuracy and the speed of convergence. We also observe that the currently most popular K-means++ initialization behaves like the random one in the very high-dimensional cases.
Collapse
|
45
|
NPF:network propagation for protein function prediction. BMC Bioinformatics 2020; 21:355. [PMID: 32787776 PMCID: PMC7430911 DOI: 10.1186/s12859-020-03663-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2020] [Accepted: 07/14/2020] [Indexed: 11/29/2022] Open
Abstract
Background The accurate annotation of protein functions is of great significance in elucidating the phenomena of life, treating disease and developing new medicines. Various methods have been developed to facilitate the prediction of these functions by combining protein interaction networks (PINs) with multi-omics data. However, it is still challenging to make full use of multiple biological to improve the performance of functions annotation. Results We presented NPF (Network Propagation for Functions prediction), an integrative protein function predicting framework assisted by network propagation and functional module detection, for discovering interacting partners with similar functions to target proteins. NPF leverages knowledge of the protein interaction network architecture and multi-omics data, such as domain annotation and protein complex information, to augment protein-protein functional similarity in a propagation manner. We have verified the great potential of NPF for accurately inferring protein functions. According to the comprehensive evaluation of NPF, it delivered a better performance than other competing methods in terms of leave-one-out cross-validation and ten-fold cross validation. Conclusions We demonstrated that network propagation, together with multi-omics data, can both discover more partners with similar function, and is unconstricted by the “small-world” feature of protein interaction networks. We conclude that the performance of function prediction depends greatly on whether we can extract and exploit proper functional information of similarity from protein correlations.
Collapse
|
46
|
Peralta D, Saeys Y. Robust unsupervised dimensionality reduction based on feature clustering for single-cell imaging data. Appl Soft Comput 2020. [DOI: 10.1016/j.asoc.2020.106421] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
|
47
|
|
48
|
Kabakus AT, Senturk A. An analysis of the professional preferences and choices of computer engineering students. COMPUTER APPLICATIONS IN ENGINEERING EDUCATION 2020; 28:994-1006. [DOI: 10.1002/cae.22279] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2020] [Accepted: 05/22/2020] [Indexed: 09/01/2023]
Affiliation(s)
| | - Arafat Senturk
- Department of Computer Engineering, Faculty of EngineeringDuzce University Duzce Turkey
| |
Collapse
|
49
|
Fingerprint Classification through Standard and Weighted Extreme Learning Machines. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10124125] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Fingerprint classification is a stage of biometric identification systems that aims to group fingerprints and reduce search times and computational complexity in the databases of fingerprints. The most recent works on this problem propose methods based on deep convolutional neural networks (CNNs) by adopting fingerprint images as inputs. These networks have achieved high classification performances, but with a high computational cost in the network training process, even by using high-performance computing techniques. In this paper, we introduce a novel fingerprint classification approach based on feature extractor models, and basic and modified extreme learning machines (ELMs), being the first time that this approach is adopted. The weighted ELMs naturally address the problem of unbalanced data, such as fingerprint databases. Some of the best and most recent extractors (Capelli02, Hong08, and Liu10), which are based on the most relevant visual characteristics of the fingerprint image, are considered. Considering the unbalanced classes for fingerprint identification schemes, we optimize the ELMs (standard, original weighted, and decay weighted) in terms of the geometric mean by estimating their hyper-parameters (regularization parameter, number of hidden neurons, and decay parameter). At the same time, the classic accuracy and penetration-rate metrics are computed for comparison purposes with the superior CNN-based methods reported in the literature. The experimental results show that weighted ELM with the presence of the golden-ratio in the weighted matrix (W-ELM2) overall outperforms the rest of the ELMs. The combination of the Hong08 extractor and W-ELM2 competes with CNNs in terms of the fingerprint classification efficacy, but the ELMs-based methods have been demonstrated their extremely fast training speeds in any context.
Collapse
|
50
|
Supervised feature selection by constituting a basis for the original space of features and matrix factorization. INT J MACH LEARN CYB 2019. [DOI: 10.1007/s13042-019-01046-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
|