1
|
Guan Y, Xue Z, Wang J, Ai X, Chen R, Yi X, Lu S, Liu Y. SAFE-MIL: a statistically interpretable framework for screening potential targeted therapy patients based on risk estimation. Front Genet 2024; 15:1381851. [PMID: 39211737 PMCID: PMC11357964 DOI: 10.3389/fgene.2024.1381851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2024] [Accepted: 07/31/2024] [Indexed: 09/04/2024] Open
Abstract
Patients with the target gene mutation frequently derive significant clinical benefits from target therapy. However, differences in the abundance level of mutations among patients resulted in varying survival benefits, even among patients with the same target gene mutations. Currently, there is a lack of rational and interpretable models to assess the risk of treatment failure. In this study, we investigated the underlying coupled factors contributing to variations in medication sensitivity and established a statistically interpretable framework, named SAFE-MIL, for risk estimation. We first constructed an effectiveness label for each patient from the perspective of exploring the optimal grouping of patients' positive judgment values and sampled patients into 600 and 1,000 groups, respectively, based on multi-instance learning (MIL). A novel and interpretable loss function was further designed based on the Hosmer-Lemeshow test for this framework. By integrating multi-instance learning with the Hosmer-Lemeshow test, SAFE-MIL is capable of accurately estimating the risk of drug treatment failure across diverse patient cohorts and providing the optimal threshold for assessing the risk stratification simultaneously. We conducted a comprehensive case study involving 457 non-small cell lung cancer patients with EGFR mutations treated with EGFR tyrosine kinase inhibitors. Results demonstrate that SAFE-MIL outperforms traditional regression methods with higher accuracy and can accurately assess patients' risk stratification. This underscores its ability to accurately capture inter-patient variability in risk while providing statistical interpretability. SAFE-MIL is able to effectively guide clinical decision-making regarding the use of drugs in targeted therapy and provides an interpretable computational framework for other patient stratification problems. The SAFE-MIL framework has proven its effectiveness in capturing inter-patient variability in risk and providing statistical interpretability. It outperforms traditional regression methods and can effectively guide clinical decision-making in the use of drugs for targeted therapy. SAFE-MIL offers a valuable interpretable computational framework that can be applied to other patient stratification problems, enhancing the precision of risk assessment in personalized medicine. The source code for SAFE-MIL is available for further exploration and application at https://github.com/Nevermore233/SAFE-MIL.
Collapse
Affiliation(s)
- Yanfang Guan
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
- Geneplus Beijing Institute, Beijing, China
| | - Zhengfa Xue
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Jiayin Wang
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| | - Xinghao Ai
- Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | | | - Xin Yi
- Geneplus Beijing Institute, Beijing, China
| | - Shun Lu
- Shanghai Chest Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
| | - Yuqian Liu
- School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an, China
- Shaanxi Engineering Research Center of Medical and Health Big Data, Xi’an Jiaotong University, Xi’an, China
| |
Collapse
|
2
|
Heidari M, Moattar MH, Ghaffari H. Forward propagation dropout in deep neural networks using Jensen-Shannon and random forest feature importance ranking. Neural Netw 2023; 165:238-247. [PMID: 37307667 DOI: 10.1016/j.neunet.2023.05.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2022] [Revised: 04/13/2023] [Accepted: 05/23/2023] [Indexed: 06/14/2023]
Abstract
Dropout is a mechanism to prevent deep neural networks from overfitting and improving their generalization. Random dropout is the simplest method, where nodes are randomly terminated at each step of the training phase, which may lead to network accuracy reduction. In dynamic dropout, the importance of each node and its impact on the network performance is calculated, and the important nodes do not participate in the dropout. But the problem is that the importance of the nodes is not calculated consistently. A node may be considered less important and be dropped in one training epoch and on a batch of data before entering the next epoch, in which it may be an important node. On the other hand, calculating the importance of each unit in every training step is costly. In the proposed method, using random forest and Jensen-Shannon divergence, the importance of each node is calculated once. Then, in the forward propagation steps, the importance of the nodes is propagated and used in the dropout mechanism. This method is evaluated and compared with some previously proposed dropout approaches using two different deep neural network architectures on the MNIST, NorB, CIFAR10, CIFAR100, SVHN, and ImageNet datasets. The results suggest that the proposed method has better accuracy with fewer nodes and better generalizability. Also, the evaluations show that the approach has comparable complexity with other approaches and its convergence time is low as compared with state-of-the-art methods.
Collapse
Affiliation(s)
- Mohsen Heidari
- Department of Computer Engineering, Ferdows Branch, Islamic Azad University, Ferdows, Iran.
| | | | - Hamidreza Ghaffari
- Department of Computer Engineering, Ferdows Branch, Islamic Azad University, Ferdows, Iran.
| |
Collapse
|
3
|
Magris M, Iosifidis A. Bayesian learning for neural networks: an algorithmic survey. Artif Intell Rev 2023. [DOI: 10.1007/s10462-023-10443-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Abstract
AbstractThe last decade witnessed a growing interest in Bayesian learning. Yet, the technicality of the topic and the multitude of ingredients involved therein, besides the complexity of turning theory into practical implementations, limit the use of the Bayesian learning paradigm, preventing its widespread adoption across different fields and applications. This self-contained survey engages and introduces readers to the principles and algorithms of Bayesian Learning for Neural Networks. It provides an introduction to the topic from an accessible, practical-algorithmic perspective. Upon providing a general introduction to Bayesian Neural Networks, we discuss and present both standard and recent approaches for Bayesian inference, with an emphasis on solutions relying on Variational Inference and the use of Natural gradients. We also discuss the use of manifold optimization as a state-of-the-art approach to Bayesian learning. We examine the characteristic properties of all the discussed methods, and provide pseudo-codes for their implementation, paying attention to practical aspects, such as the computation of the gradients.
Collapse
|
4
|
Guo D, Xu C, Tao D. Bilinear Graph Networks for Visual Question Answering. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2023; 34:1023-1034. [PMID: 34428156 DOI: 10.1109/tnnls.2021.3104937] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
This article revisits the bilinear attention networks (BANs) in the visual question answering task from a graph perspective. The classical BANs build a bilinear attention map to extract the joint representation of words in the question and objects in the image but lack fully exploring the relationship between words for complex reasoning. In contrast, we develop bilinear graph networks to model the context of the joint embeddings of words and objects. Two kinds of graphs are investigated, namely, image-graph and question-graph. The image-graph transfers features of the detected objects to their related query words, enabling the output nodes to have both semantic and factual information. The question-graph exchanges information between these output nodes from image-graph to amplify the implicit yet important relationship between objects. These two kinds of graphs cooperate with each other, and thus, our resulting model can build the relationship and dependency between objects, which leads to the realization of multistep reasoning. Experimental results on the VQA v2.0 validation dataset demonstrate the ability of our method to handle complex questions. On the test-std set, our best single model achieves state-of-the-art performance, boosting the overall accuracy to 72.56%, and we are one of the top-two entries in the VQA Challenge 2020.
Collapse
|
5
|
Fahad S, Su F, Khan SU, Naeem MR, Wei K. Implementing a novel deep learning technique for rainfall forecasting via climatic variables: An approach via hierarchical clustering analysis. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 854:158760. [PMID: 36113802 DOI: 10.1016/j.scitotenv.2022.158760] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 09/09/2022] [Accepted: 09/10/2022] [Indexed: 06/15/2023]
Abstract
Variations in rainfall negatively affect crop productivity and impose severe climatic conditions in developing regions. Studies that focus on climatic variations such as variability in rainfall and temperature are vital, particularly in predominant rainfed areas. Forecasting rainfall is very essential in the agriculture sector due to the dependence of many people, while it is very complex to accurately predict rainfall due to its dynamic nature. This study aims to present a deep forecasting model based on optimized (Gated Recurrent Unit) GRU neural network to predict rainfall in Pakistan based on the 30 years of climate data from 1991 to 2020. The climatic variables were first extracted and then fine-tuned by eliminating outliers and extreme values from the data set for precise forecasting. Data normalization strategies were further utilized to adjust numeric values into a standard scale without distorting divergences or losing useful information. The proposed model achieved high prediction accuracy by maintaining minimal Normalized Mean Absolute Error (NMAE) and Normalized Root Mean Squared Error (NRMSE) compared to state-of-the-art rainfall forecasting models. Climatic variables used in the forecasting were evaluated in terms of correlation and regression analysis. The correlation results showed that temperature has a negative association and air quality variables have a positive association with rainfall in each quarter of the year. The second and third quarters of the year showed a high association with rainfall, whereas the air quality variables showed a lesser or no association with rainfall during the first and second quarters of the year. The results further showed a strong association of climatic variables with rainfall for all months of the year. The minimal loss achieved by the proposed model also demonstrated the feasibility of selected variables in precise forecasting of rainfall regardless of volatile climatic conditions.
Collapse
Affiliation(s)
- Shah Fahad
- School of Management, Hainan University, Haikou 570228, Hainan Province, China.
| | - Fang Su
- School of Economics and Management, Northwest University, Xi'an, China
| | - Sufyan Ullah Khan
- Department of Economics and Finance, UiS Business School, University of Stavanger, 4036 Stavanger, Norway
| | - Muhammad Rashid Naeem
- School of Electronic Information and Artificial Intelligence, Leshan Normal University, Leshan 614000, China
| | - Kailei Wei
- School of Management, Hainan University, Haikou 570228, Hainan Province, China.
| |
Collapse
|
6
|
Abrar S, Samad MD. Perturbation of deep autoencoder weights for model compression and classification of tabular data. Neural Netw 2022; 156:160-169. [PMID: 36270199 PMCID: PMC9669225 DOI: 10.1016/j.neunet.2022.09.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Revised: 07/18/2022] [Accepted: 09/19/2022] [Indexed: 11/16/2022]
Abstract
Fully connected deep neural networks (DNN) often include redundant weights leading to overfitting and high memory requirements. Additionally, in tabular data classification, DNNs are challenged by the often superior performance of traditional machine learning models. This paper proposes periodic perturbations (prune and regrow) of DNN weights, especially at the self-supervised pre-training stage of deep autoencoders. The proposed weight perturbation strategy outperforms dropout learning or weight regularization (L1 or L2) for four out of six tabular data sets in downstream classification tasks. Unlike dropout learning, the proposed weight perturbation routine additionally achieves 15% to 40% sparsity across six tabular data sets, resulting in compressed pretrained models. The proposed pretrained model compression improves the accuracy of downstream classification, unlike traditional weight pruning methods that trade off performance for model compression. Our experiments reveal that a pretrained deep autoencoder with weight perturbation can outperform traditional machine learning in tabular data classification, whereas baseline fully-connected DNNs yield the worst classification accuracy. However, traditional machine learning models are superior to any deep model when a tabular data set contains uncorrelated variables. Therefore, the performance of deep models with tabular data is contingent on the types and statistics of constituent variables.
Collapse
Affiliation(s)
- Sakib Abrar
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, United States
| | - Manar D Samad
- Department of Computer Science, Tennessee State University, Nashville, TN 37209, United States.
| |
Collapse
|
7
|
Lee C, Zhang Z, Janušonis S. Brain serotonergic fibers suggest anomalous diffusion-based dropout in artificial neural networks. Front Neurosci 2022; 16:949934. [PMID: 36267232 PMCID: PMC9577023 DOI: 10.3389/fnins.2022.949934] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2022] [Accepted: 09/08/2022] [Indexed: 11/13/2022] Open
Abstract
Random dropout has become a standard regularization technique in artificial neural networks (ANNs), but it is currently unknown whether an analogous mechanism exists in biological neural networks (BioNNs). If it does, its structure is likely to be optimized by hundreds of millions of years of evolution, which may suggest novel dropout strategies in large-scale ANNs. We propose that the brain serotonergic fibers (axons) meet some of the expected criteria because of their ubiquitous presence, stochastic structure, and ability to grow throughout the individual's lifespan. Since the trajectories of serotonergic fibers can be modeled as paths of anomalous diffusion processes, in this proof-of-concept study we investigated a dropout algorithm based on the superdiffusive fractional Brownian motion (FBM). The results demonstrate that serotonergic fibers can potentially implement a dropout-like mechanism in brain tissue, supporting neuroplasticity. They also suggest that mathematical theories of the structure and dynamics of serotonergic fibers can contribute to the design of dropout algorithms in ANNs.
Collapse
Affiliation(s)
- Christian Lee
- Department of Electrical and Computer Engineering, University of California, Santa Barbara, Santa Barbara, CA, United States
| | - Zheng Zhang
- Department of Electrical and Computer Engineering, University of California, Santa Barbara, Santa Barbara, CA, United States
| | - Skirmantas Janušonis
- Department of Psychological and Brain Sciences, University of California, Santa Barbara, Santa Barbara, CA, United States
| |
Collapse
|
8
|
Xie J, Ma Z, Lei J, Zhang G, Xue JH, Tan ZH, Guo J. Advanced Dropout: A Model-Free Methodology for Bayesian Dropout Optimization. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2022; 44:4605-4625. [PMID: 34029187 DOI: 10.1109/tpami.2021.3083089] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Due to lack of data, overfitting ubiquitously exists in real-world applications of deep neural networks (DNNs). We propose advanced dropout, a model-free methodology, to mitigate overfitting and improve the performance of DNNs. The advanced dropout technique applies a model-free and easily implemented distribution with parametric prior, and adaptively adjusts dropout rate. Specifically, the distribution parameters are optimized by stochastic gradient variational Bayes in order to carry out an end-to-end training. We evaluate the effectiveness of the advanced dropout against nine dropout techniques on seven computer vision datasets (five small-scale datasets and two large-scale datasets) with various base models. The advanced dropout outperforms all the referred techniques on all the datasets. We further compare the effectiveness ratios and find that advanced dropout achieves the highest one on most cases. Next, we conduct a set of analysis of dropout rate characteristics, including convergence of the adaptive dropout rate, the learned distributions of dropout masks, and a comparison with dropout rate generation without an explicit distribution. In addition, the ability of overfitting prevention is evaluated and confirmed. Finally, we extend the application of the advanced dropout to uncertainty inference, network pruning, text classification, and regression. The proposed advanced dropout is also superior to the corresponding referred methods. Codes are available at https://github.com/PRIS-CV/AdvancedDropout.
Collapse
|
9
|
Fuzzy rule dropout with dynamic compensation for wide learning algorithm of TSK fuzzy classifier. Appl Soft Comput 2022. [DOI: 10.1016/j.asoc.2022.109410] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
10
|
Qin B, Chung FL, Wang S. KAT: A Knowledge Adversarial Training Method for Zero-Order Takagi-Sugeno-Kang Fuzzy Classifiers. IEEE TRANSACTIONS ON CYBERNETICS 2022; 52:6857-6871. [PMID: 33284765 DOI: 10.1109/tcyb.2020.3034792] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
While input or output-perturbation-based adversarial training techniques have been exploited to enhance the generalization capability of a variety of nonfuzzy and fuzzy classifiers by means of dynamic regularization, their performance may perhaps be very sensitive to some inappropriate adversarial samples. In order to avoid this weakness and simultaneously ensure enhanced generalization capability, this work attempts to explore a novel knowledge adversarial attack model for the zero-order Tagaki-Sugeno-Kang (TSK) fuzzy classifiers. The proposed model is motivated by exploiting the existence of special knowledge adversarial attacks from the perspective of the human-like thinking process when training an interpretable zero-order TSK fuzzy classifier. Without any direct use of adversarial samples, which is different from input or output perturbation-based adversarial attacks, the proposed model considers adversarial perturbations of interpretable zero-order fuzzy rules in a knowledge-oblivion and/or knowledge-bias or their ensemble to mimic the robust use of knowledge in the human thinking process. Through dynamic regularization, the proposed model is theoretically justified for its strong generalization capability. Accordingly, a novel knowledge adversarial training method called KAT is devised to achieve promising generalization performance, interpretability, and fast training for zero-order TSK fuzzy classifiers. The effectiveness of KAT is manifested by the experimental results on 15 benchmarking UCI and KEEL datasets.
Collapse
|
11
|
Multi-Modal Alignment of Visual Question Answering Based on Multi-Hop Attention Mechanism. ELECTRONICS 2022. [DOI: 10.3390/electronics11111778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
The alignment of information between the image and the question is of great significance in the visual question answering (VQA) task. Self-attention is commonly used to generate attention weights between image and question. These attention weights can align two modalities. Through the attention weight, the model can select the relevant area of the image to align with the question. However, when using the self-attention mechanism, the attention weight between two objects is only determined by the representation of these two objects. It ignores the influence of other objects around these two objects. This contribution proposes a novel multi-hop attention alignment method that enriches surrounding information when using self-attention to align two modalities. Simultaneously, in order to utilize position information in alignment, we also propose a position embedding mechanism. The position embedding mechanism extracts the position information of each object and implements the position embedding mechanism to align the question word with the correct position in the image. According to the experiment on the VQA2.0 dataset, our model achieves validation accuracy of 65.77%, outperforming several state-of-the-art methods. The experimental result shows that our proposed methods have better performance and effectiveness.
Collapse
|
12
|
Suhang G, Vong CM, Wong PK, Wang S. Fast Training of Adversarial Deep Fuzzy Classifier by Downsizing Fuzzy Rules With Gradient Guided Learning. IEEE TRANSACTIONS ON FUZZY SYSTEMS 2022; 30:1967-1980. [DOI: 10.1109/tfuzz.2021.3072498] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/30/2024]
Affiliation(s)
- Gu Suhang
- School of AI and Computer Science and the Jiangsu Province Key Laboratory of Media Design and Software Technologies, Jiangnan University, Wuxi, China
| | - Chi Man Vong
- Faculty of Science and Technology, University of Macau, Taipa, Macau, China
| | - Pak Kin Wong
- Faculty of Science and Technology, University of Macau, Taipa, Macau, China
| | - Shitong Wang
- School of AI and Computer Science, Jiangnan University, Wuxi, China
| |
Collapse
|
13
|
Bârzan H, Ichim AM, Moca VV, Mureşan RC. Time-Frequency Representations of Brain Oscillations: Which One Is Better? Front Neuroinform 2022; 16:871904. [PMID: 35492077 PMCID: PMC9050353 DOI: 10.3389/fninf.2022.871904] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 03/21/2022] [Indexed: 02/02/2023] Open
Abstract
Brain oscillations are thought to subserve important functions by organizing the dynamical landscape of neural circuits. The expression of such oscillations in neural signals is usually evaluated using time-frequency representations (TFR), which resolve oscillatory processes in both time and frequency. While a vast number of methods exist to compute TFRs, there is often no objective criterion to decide which one is better. In feature-rich data, such as that recorded from the brain, sources of noise and unrelated processes abound and contaminate results. The impact of these distractor sources is especially problematic, such that TFRs that are more robust to contaminants are expected to provide more useful representations. In addition, the minutiae of the techniques themselves impart better or worse time and frequency resolutions, which also influence the usefulness of the TFRs. Here, we introduce a methodology to evaluate the "quality" of TFRs of neural signals by quantifying how much information they retain about the experimental condition during visual stimulation and recognition tasks, in mice and humans, respectively. We used machine learning to discriminate between various experimental conditions based on TFRs computed with different methods. We found that various methods provide more or less informative TFRs depending on the characteristics of the data. In general, however, more advanced techniques, such as the superlet transform, seem to provide better results for complex time-frequency landscapes, such as those extracted from electroencephalography signals. Finally, we introduce a method based on feature perturbation that is able to quantify how much time-frequency components contribute to the correct discrimination among experimental conditions. The methodology introduced in the present study may be extended to other analyses of neural data, enabling the discovery of data features that are modulated by the experimental manipulation.
Collapse
Affiliation(s)
- Harald Bârzan
- Department of Theoretical and Experimental Neuroscience, Transylvanian Institute of Neuroscience, Cluj-Napoca, Romania
- Department of Electronics, Telecommunications and Informational Technologies, Technical University of Cluj-Napoca, Cluj-Napoca, Romania
| | - Ana-Maria Ichim
- Department of Theoretical and Experimental Neuroscience, Transylvanian Institute of Neuroscience, Cluj-Napoca, Romania
- Department of Electronics, Telecommunications and Informational Technologies, Technical University of Cluj-Napoca, Cluj-Napoca, Romania
| | - Vasile Vlad Moca
- Department of Theoretical and Experimental Neuroscience, Transylvanian Institute of Neuroscience, Cluj-Napoca, Romania
| | - Raul Cristian Mureşan
- Department of Theoretical and Experimental Neuroscience, Transylvanian Institute of Neuroscience, Cluj-Napoca, Romania
| |
Collapse
|
14
|
Xia W, Zheng L, Fang J, Li F, Zhou Y, Zeng Z, Zhang B, Li Z, Li H, Zhu F. PFmulDL: a novel strategy enabling multi-class and multi-label protein function annotation by integrating diverse deep learning methods. Comput Biol Med 2022; 145:105465. [PMID: 35366467 DOI: 10.1016/j.compbiomed.2022.105465] [Citation(s) in RCA: 38] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 03/22/2022] [Accepted: 03/25/2022] [Indexed: 02/06/2023]
Abstract
Bioinformatic annotation of protein function is essential but extremely sophisticated, which asks for extensive efforts to develop effective prediction method. However, the existing methods tend to amplify the representativeness of the families with large number of proteins by misclassifying the proteins in the families with small number of proteins. That is to say, the ability of the existing methods to annotate proteins in the 'rare classes' remains limited. Herein, a new protein function annotation strategy, PFmulDL, integrating multiple deep learning methods, was thus constructed. First, the recurrent neural network was integrated, for the first time, with the convolutional neural network to facilitate the function annotation. Second, a transfer learning method was introduced to the model construction for further improving the prediction performances. Third, based on the latest data of Gene Ontology, the newly constructed model could annotate the largest number of protein families comparing with the existing methods. Finally, this newly constructed model was found capable of significantly elevating the prediction performance for the 'rare classes' without sacrificing that for the 'major classes'. All in all, due to the emerging requirements on improving the prediction performance for the proteins in 'rare classes', this new strategy would become an essential complement to the existing methods for protein function prediction. All the models and source codes are freely available and open to all users at: https://github.com/idrblab/PFmulDL.
Collapse
Affiliation(s)
- Weiqi Xia
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Lingyan Zheng
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China; Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Jiebin Fang
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Fengcheng Li
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Ying Zhou
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China
| | - Zhenyu Zeng
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Bing Zhang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Zhaorong Li
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China
| | - Honglin Li
- School of Pharmacy, East China University of Science and Technology, Shanghai, 200237, China.
| | - Feng Zhu
- College of Pharmaceutical Sciences, The Second Affiliated Hospital, Zhejiang University School of Medicine, Zhejiang University, Hangzhou, 310058, China; Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou, 330110, China.
| |
Collapse
|
15
|
Zhu M, Wang Q, Luo J. Emotion Recognition Based on Dynamic Energy Features Using a Bi-LSTM Network. Front Comput Neurosci 2022; 15:741086. [PMID: 35264939 PMCID: PMC8900638 DOI: 10.3389/fncom.2021.741086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 12/31/2021] [Indexed: 11/22/2022] Open
Abstract
Among electroencephalogram (EEG) signal emotion recognition methods based on deep learning, most methods have difficulty in using a high-quality model due to the low resolution and the small sample size of EEG images. To solve this problem, this study proposes a deep network model based on dynamic energy features. In this method, first, to reduce the noise superposition caused by feature analysis and extraction, the concept of an energy sequence is proposed. Second, to obtain the feature set reflecting the time persistence and multicomponent complexity of EEG signals, the construction method of the dynamic energy feature set is given. Finally, to make the network model suitable for small datasets, we used fully connected layers and bidirectional long short-term memory (Bi-LSTM) networks. To verify the effectiveness of the proposed method, we used leave one subject out (LOSO) and 10-fold cross validation (CV) strategies to carry out experiments on the SEED and DEAP datasets. The experimental results show that the accuracy of the proposed method can reach 89.42% (SEED) and 77.34% (DEAP).
Collapse
Affiliation(s)
- Meili Zhu
- Modern Animation Technology Engineering Research Center of Jilin Higher Learning Institutions, Jilin Animation Institute, Changchun, China
| | | | | |
Collapse
|
16
|
A wide interpretable Gaussian Takagi-Sugeno-Kang fuzzy classifier and its incremental learning. Knowl Based Syst 2022. [DOI: 10.1016/j.knosys.2022.108203] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
17
|
Thakkar A, Lohiya R. Analyzing fusion of regularization techniques in the deep learning‐based intrusion detection system. INT J INTELL SYST 2021. [DOI: 10.1002/int.22590] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Ankit Thakkar
- Department of Computer Science and Engineering, Institute of Technology Nirma University Ahmedabad Gujarat India
| | - Ritika Lohiya
- Department of Computer Science and Engineering, Institute of Technology Nirma University Ahmedabad Gujarat India
| |
Collapse
|
18
|
Concrete Cracks Detection and Monitoring Using Deep Learning-Based Multiresolution Analysis. ELECTRONICS 2021. [DOI: 10.3390/electronics10151772] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
In this paper, we propose a new methodology for crack detection and monitoring in concrete structures. This approach is based on a multiresolution analysis of a sample or a specimen of concrete material subjected to several types of solicitation. The image obtained by ultrasonic investigation and processed by a customized wavelet is analyzed at various scales in order to detect internal cracks and crack initiation. The ultimate objective of this work is to propose an automatic crack type identification scheme based on convolutional neural networks (CNN). In this context, crack propagation can be monitored without access to the concrete surface and the goal is to detect cracks before they are visible. This is achieved through the combination of two major data analysis tools which are wavelets and deep learning. This original procedure is shown to yield a high accuracy close to 90%. In order to evaluate the performance of the proposed CNN architectures, we also used an open access database, SDNET2018, for the automatic detection of external cracks.
Collapse
|
19
|
Samad MD, Hossain R, Iftekharuddin KM. Dynamic Perturbation of Weights for Improved Data Reconstruction in Unsupervised Learning. PROCEEDINGS OF ... INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS. INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2021; 2021:10.1109/ijcnn52387.2021.9533539. [PMID: 36157884 PMCID: PMC9493331 DOI: 10.1109/ijcnn52387.2021.9533539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
The concept of weight pruning has shown success in neural network model compression with marginal loss in classification performance. However, similar concepts have not been well recognized in improving unsupervised learning. To the best of our knowledge, this paper proposes one of the first studies on weight pruning in unsupervised autoencoder models using non-imaging data points. We adapt the weight pruning concept to investigate the dynamic behavior of weights while reconstructing data using an autoencoder and propose a deterministic model perturbation algorithm based on the weight statistics. The model perturbation at periodic intervals resets a percentage of weight values using a binary weight mask. Experiments across eight non-imaging data sets ranging from gene sequence to swarm behavior data show that only a few periodic perturbations of weights improve the data reconstruction accuracy of autoencoders and additionally introduce model compression. All data sets yield a small portion of (<5%) weights that are substantially higher than the mean weight value. These weights are found to be much more informative than a substantial portion (>90%) of the weights with negative values. In general, the perturbation of low or negative weight values at periodic intervals has improved the data reconstruction loss for most data sets when compared to the case without perturbation. The proposed approach may help explain and correct the dynamic behavior of neural network models in a deterministic way for data reconstruction and obtaining a more accurate representation of latent variables using autoencoders.
Collapse
Affiliation(s)
- Manar D Samad
- Dept. of Computer Science, Tennessee State University, Nashville, TN, USA
| | - Rahim Hossain
- Dept. of EEE, Bangladesh Univ. of Eng. and Tech., Dhaka, Bangladesh
| | - Khan M Iftekharuddin
- Dept. of Electrical and Computer Engineering, Old Dominion University, Norfolk, VA, USA
| |
Collapse
|
20
|
Wang Y, Bian ZP, Hou J, Chau LP. Convolutional Neural Networks With Dynamic Regularization. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:2299-2304. [PMID: 32511095 DOI: 10.1109/tnnls.2020.2997044] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Regularization is commonly used for alleviating overfitting in machine learning. For convolutional neural networks (CNNs), regularization methods, such as DropBlock and Shake-Shake, have illustrated the improvement in the generalization performance. However, these methods lack a self-adaptive ability throughout training. That is, the regularization strength is fixed to a predefined schedule, and manual adjustments are required to adapt to various network architectures. In this article, we propose a dynamic regularization method for CNNs. Specifically, we model the regularization strength as a function of the training loss. According to the change of the training loss, our method can dynamically adjust the regularization strength in the training procedure, thereby balancing the underfitting and overfitting of CNNs. With dynamic regularization, a large-scale model is automatically regularized by the strong perturbation, and vice versa. Experimental results show that the proposed method can improve the generalization capability on off-the-shelf network architectures and outperform state-of-the-art regularization methods.
Collapse
|
21
|
|
22
|
|
23
|
|
24
|
Bejani MM, Ghatee M. Theory of adaptive SVD regularization for deep neural networks. Neural Netw 2020; 128:33-46. [PMID: 32413786 DOI: 10.1016/j.neunet.2020.04.021] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Revised: 02/24/2020] [Accepted: 04/23/2020] [Indexed: 12/15/2022]
Abstract
Deep networks can learn complex problems, however, they suffer from overfitting. To solve this problem, regularization methods have been proposed that are not adaptable to the dynamic changes in the training process. With a different approach, this paper presents a regularization method based on the Singular Value Decomposition (SVD) that adjusts the learning model adaptively. To this end, the overfitting can be evaluated by condition numbers of the synaptic matrices. When the overfitting is high, the matrices are substituted with their SVD approximations. Some theoretical results are derived to show the performance of this regularization method. It is proved that SVD approximation cannot solve overfitting after several iterations. Thus, a new Tikhonov term is added to the loss function to converge the synaptic weights to the SVD approximation of the best-found results. Following this approach, an Adaptive SVD Regularization (ASR) is proposed to adjust the learning model with respect to the dynamic training characteristics. ASR results are visualized to show how ASR overcomes overfitting. The different configurations of Convolutional Neural Networks (CNN) are implemented with different augmentation schemes to compare ASR with state-of-the-art regularization methods. The results show that on MNIST, F-MNIST, SVHN, CIFAR-10 and CIFAR-100, the accuracies of ASR are 99.4%, 95.7%, 97.1%, 93.2% and 55.6%, respectively. Although ASR improves the overfitting and validation loss, its elapsed time is not significantly greater than the learning without regularization.
Collapse
Affiliation(s)
- Mohammad Mahdi Bejani
- Department of Computer Science, Faculty of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Iran.
| | - Mehdi Ghatee
- Department of Computer Science, Faculty of Mathematics and Computer Science, Amirkabir University of Technology (Tehran Polytechnic), Iran.
| |
Collapse
|
25
|
Gui YM, Wang RJ, Wang X, Wei YY. Using Deep Neural Networks to Improve the Performance of Protein–Protein Interactions Prediction. INT J PATTERN RECOGN 2020. [DOI: 10.1142/s0218001420520126] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Protein–protein interactions (PPIs) help to elucidate the molecular mechanisms of life activities and have a certain role in promoting disease treatment and new drug development. With the advent of the proteomics era, some PPIs prediction methods have emerged. However, the performances of these PPIs prediction methods still need to be optimized and improved. In order to optimize the performance of the PPIs prediction methods, we used the dropout method to reduce over-fitting by deep neural networks (DNNs), and combined with three types of feature extraction methods, conjoint triad (CT), auto covariance (AC) and local descriptor (LD), to build DNN models based on amino acid sequences. The results showed that the accuracy of the CT, AC and LD increased from 97.11% to 98.12%, 96.84% to 98.17%, and 95.30% to 95.60%, respectively. The loss values of the CT, AC and LD decreased from 27.47% to 14.96%, 65.91% to 17.82% and 36.23% to 15.34%, respectively. Experimental results show that dropout can optimize the performances of the DNN models. The results can provide a resource for scholars in future studies involving the prediction of PPIs. The experimental code is available at https://github.com/smalltalkman/hppi-tensorflow .
Collapse
Affiliation(s)
- Yuan-Miao Gui
- Institute of Intelligent Machines, Hefei Institute of Physics, Chinese Academy of Sciences, Hefei City, Anhui Province, P. R. China
- University of Science and Technology of China, Hefei City, Anhui Province, P. R. China
| | - Ru-Jing Wang
- Institute of Intelligent Machines, Hefei Institute of Physics, Chinese Academy of Sciences, Hefei City, Anhui Province, P. R. China
| | - Xue Wang
- Institute of Intelligent Machines, Hefei Institute of Physics, Chinese Academy of Sciences, Hefei City, Anhui Province, P. R. China
| | - Yuan-Yuan Wei
- Institute of Intelligent Machines, Hefei Institute of Physics, Chinese Academy of Sciences, Hefei City, Anhui Province, P. R. China
| |
Collapse
|
26
|
Feng S, Ren W, Han M, Chen YW. Robust manifold broad learning system for large-scale noisy chaotic time series prediction: A perturbation perspective. Neural Netw 2019; 117:179-190. [PMID: 31170577 DOI: 10.1016/j.neunet.2019.05.009] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2018] [Revised: 05/07/2019] [Accepted: 05/09/2019] [Indexed: 11/28/2022]
Abstract
Noises and outliers commonly exist in dynamical systems because of sensor disturbations or extreme dynamics. Thus, the robustness and generalization capacity are of vital importance for system modeling. In this paper, the robust manifold broad learning system(RM-BLS) is proposed for system modeling and large-scale noisy chaotic time series prediction. Manifold embedding is utilized for chaotic system evolution discovery. The manifold representation is randomly corrupted by perturbations while the features not related to low-dimensional manifold embedding are discarded by feature selection. It leads to a robust learning paradigm and achieves better generalization performance. We also develop an efficient solution for Stiefel manifold optimization, in which the orthogonal constraints are maintained by Cayley transformation and curvilinear search algorithm. Furthermore, we discuss the common thoughts between random perturbation approximation and other mainstream regularization methods. We also prove the equivalence between perturbations to manifold embedding and Tikhonov regularization. Simulation results on large-scale noisy chaotic time series prediction illustrates the robustness and generalization performance of our method.
Collapse
Affiliation(s)
- Shoubo Feng
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China.
| | - Weijie Ren
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China.
| | - Min Han
- Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning 116024, China.
| | - Yen Wei Chen
- Graduate School of Information Science and Engineering, Ritsumeikan University, Shiga, Japan.
| |
Collapse
|
27
|
Yang L, Song Q, Wu Y, Hu M. Attention Inspiring Receptive-Fields Network for Learning Invariant Representations. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2019; 30:1744-1755. [PMID: 30371393 DOI: 10.1109/tnnls.2018.2873722] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
In this paper, we describe a simple and highly efficient module for image classification, which we term the "Attention Inspiring Receptive-fields" (Air) module. We effectively convert the spatial attention mechanism into a plug-in module. In addition, we reveal the relationship between the spatial attention mechanism and the receptive fields, indicating that the proper use of the spatial attention mechanism can effectively increase the receptive fields of the module, which is able to enhance translation invariance and scale invariance of the network. By integrating the Air module into advanced convolutional neural networks (such as ResNet and ResNeXt), we can construct AirNet architectures for learning invariant representations and gain significant improvements on challenging data sets. We present extensive experiments on CIFAR and ImageNet data sets to verify the effectiveness and feature invariance of the Air module and explore more concise and efficient designs of the proposed module. On ImageNet classification, our AirNet-50 and AirNet-101 (ResNet-50/101 with Air module) achieve 1.69% and 1.50% top-1 accuracy improvement with a small amount of extra computation and parameters compared with the original ResNet. We make models and code public available https://github.com/soeaver/AirNet-PyTorch. We further demonstrate that AirNet has a good ability for transfer learning and measure the performance on Microsoft Common Objects in Context object detection, instance segmentation, and pose estimation.
Collapse
|
28
|
Matsuzaka Y, Uesawa Y. Optimization of a Deep-Learning Method Based on the Classification of Images Generated by Parameterized Deep Snap a Novel Molecular-Image-Input Technique for Quantitative Structure-Activity Relationship (QSAR) Analysis. Front Bioeng Biotechnol 2019; 7:65. [PMID: 30984753 PMCID: PMC6447703 DOI: 10.3389/fbioe.2019.00065] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2018] [Accepted: 03/07/2019] [Indexed: 12/22/2022] Open
Abstract
Numerous chemical compounds are distributed around the world and may affect the homeostasis of the endocrine system by disrupting the normal functions of hormone receptors. Although the risks associated with these compounds have been evaluated by acute toxicity testing in mammalian models, the chronic toxicity of many chemicals remains due to high cost of the compounds and the testing, etc. However, computational approaches may be promising alternatives and reduce these evaluations. Recently, deep learning (DL) has been shown to be promising prediction models with high accuracy for recognition of images, speech, signals, and videos since it greatly benefits from large datasets. Recently, a novel DL-based technique called DeepSnap was developed to conduct QSAR analysis using three-dimensional images of chemical structures. It can be used to predict the potential toxicity of many different chemicals to various receptors without extraction of descriptors. DeepSnap has been shown to have a very high capacity in tests using Tox21 quantitative qHTP datasets. Numerous parameters must be adjusted to use the DeepSnap method but they have not been optimized. In this study, the effects of these parameters on the performance of the DL prediction model were evaluated in terms of the loss in validation as an indicator for evaluating the performance of the DL using the toxicity information in the Tox21 qHTP database. The relations of the parameters of DeepSnap such as (1) number of molecules per SDF split into (2) zoom factor percentage, (3) atom size for van der waals percentage, (4) bond radius, (5) minimum bond distance, and (6) bond tolerance, with the validation loss following quadratic function curves, which suggests that optimal thresholds exist to attain the best performance with these prediction models. Using the parameter values set with the best performance, the prediction model of chemical compounds for CAR agonist was built using 64 images, at 105° angle, with AUC of 0.791. Thus, based on these parameters, the proposed DeepSnap-DL approach will be highly reliable and beneficial to establish models to assess the risk associated with various chemicals.
Collapse
Affiliation(s)
| | - Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Tokyo, Japan
| |
Collapse
|