1
|
Yan W, Yu F, Tan L, Mengshan L, Xiaojun X, Weihong Z, Sheng S, Jun W, Fu-An W. A hybrid machine learning model with attention mechanism and multidimensional multivariate feature coding for essential gene prediction. BMC Biol 2025; 23:108. [PMID: 40275343 DOI: 10.1186/s12915-025-02209-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 04/07/2025] [Indexed: 04/26/2025] Open
Abstract
BACKGROUND Essential genes are crucial for the development, inheritance, and survival of species. The exploration of these genes can unravel the complex mechanisms and fundamental life processes and identify potential therapeutic targets for various diseases. Therefore, the identification of essential genes is significant. Machine learning has become the mainstream approach for essential gene prediction. However, some key challenges in machine learning need to be addressed, such as the extraction of genetic features, the impact of imbalanced data, and the cross-species generalization ability. RESULTS Here, we proposed a hybrid machine learning model based on graph convolutional neural networks (GCN) and bi-directional long short-term memory (Bi-LSTM) with attention mechanism and multidimensional multivariate feature coding for essential gene prediction, called EGP Hybrid-ML. In the model, GCN was used to extract feature encoding information from the visualized graphics of gene sequences and the attention mechanism was combined with Bi-LSTM to assess the importance of each feature in gene sequences and analyze the influences of different feature encoding methods and data imbalance. Additionally, the cross-species predictive performance of the model was evaluated through cross-validation. The results indicated that the sensitivity of the EGP Hybrid-ML model reached 0.9122. CONCLUSIONS This model demonstrated the superior predictive performance and strong generalization capabilities compared to other models. The EGP Hybrid-ML model proposed in this paper has broad application prospects in bioinformatics, chemical information, and pharmaceutical information. The codes, architectures, parameters, and datasets of the proposed model are available free of charge at GitHub ( https://github.com/gnnumsli/EGP-Hybrid-ML ).
Collapse
Affiliation(s)
- Wu Yan
- Gannan Normal University, Ganzhou, Jiangxi, 341000, China.
- Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China.
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China.
| | - Fu Yu
- Ganzhou Power Supply Branch of State Grid, Jiangxi Electric Power Co., Ltd, Ganzhou, Jiangxi, 341000, China
| | - Li Tan
- Gannan Normal University, Ganzhou, Jiangxi, 341000, China
| | - Li Mengshan
- Gannan Normal University, Ganzhou, Jiangxi, 341000, China.
- Ganzhou Power Supply Branch of State Grid, Jiangxi Electric Power Co., Ltd, Ganzhou, Jiangxi, 341000, China.
| | - Xie Xiaojun
- Gannan Normal University, Ganzhou, Jiangxi, 341000, China
| | - Zhou Weihong
- Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China
| | - Sheng Sheng
- Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China
| | - Wang Jun
- Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China
| | - Wu Fu-An
- Jiangsu University of Science and Technology, Zhenjiang, Jiangsu, 212018, China.
- Sericultural Research Institute, Chinese Academy of Agricultural Sciences, Zhenjiang, Jiangsu, 212018, China.
| |
Collapse
|
2
|
Asim MN, Ibrahim MA, Zaib A, Dengel A. DNA sequence analysis landscape: a comprehensive review of DNA sequence analysis task types, databases, datasets, word embedding methods, and language models. Front Med (Lausanne) 2025; 12:1503229. [PMID: 40265190 PMCID: PMC12011883 DOI: 10.3389/fmed.2025.1503229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Accepted: 03/10/2025] [Indexed: 04/24/2025] Open
Abstract
Deoxyribonucleic acid (DNA) serves as fundamental genetic blueprint that governs development, functioning, growth, and reproduction of all living organisms. DNA can be altered through germline and somatic mutations. Germline mutations underlie hereditary conditions, while somatic mutations can be induced by various factors including environmental influences, chemicals, lifestyle choices, and errors in DNA replication and repair mechanisms which can lead to cancer. DNA sequence analysis plays a pivotal role in uncovering the intricate information embedded within an organism's genetic blueprint and understanding the factors that can modify it. This analysis helps in early detection of genetic diseases and the design of targeted therapies. Traditional wet-lab experimental DNA sequence analysis through traditional wet-lab experimental methods is costly, time-consuming, and prone to errors. To accelerate large-scale DNA sequence analysis, researchers are developing AI applications that complement wet-lab experimental methods. These AI approaches can help generate hypotheses, prioritize experiments, and interpret results by identifying patterns in large genomic datasets. Effective integration of AI methods with experimental validation requires scientists to understand both fields. Considering the need of a comprehensive literature that bridges the gap between both fields, contributions of this paper are manifold: It presents diverse range of DNA sequence analysis tasks and AI methodologies. It equips AI researchers with essential biological knowledge of 44 distinct DNA sequence analysis tasks and aligns these tasks with 3 distinct AI-paradigms, namely, classification, regression, and clustering. It streamlines the integration of AI into DNA sequence analysis tasks by consolidating information of 36 diverse biological databases that can be used to develop benchmark datasets for 44 different DNA sequence analysis tasks. To ensure performance comparisons between new and existing AI predictors, it provides insights into 140 benchmark datasets related to 44 distinct DNA sequence analysis tasks. It presents word embeddings and language models applications across 44 distinct DNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 39 word embeddings and 67 language models based predictive pipeline performance values as well as top performing traditional sequence encoding-based predictors and their performances across 44 DNA sequence analysis tasks.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
- Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
| | - Arooj Zaib
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
| | - Andreas Dengel
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, Germany
- Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
- Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, Germany
| |
Collapse
|
3
|
Li F, Bin Y, Zhao J, Zheng C. DeepPD: A Deep Learning Method for Predicting Peptide Detectability Based on Multi-feature Representation and Information Bottleneck. Interdiscip Sci 2025; 17:200-214. [PMID: 39661307 DOI: 10.1007/s12539-024-00665-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 10/07/2024] [Accepted: 10/09/2024] [Indexed: 12/12/2024]
Abstract
Peptide detectability measures the relationship between the protein composition and abundance in the sample and the peptides identified during the analytical procedure. This relationship has significant implications for the fundamental tasks of proteomics. Existing methods primarily rely on a single type of feature representation, which limits their ability to capture the intricate and diverse characteristics of peptides. In response to this limitation, we introduce DeepPD, an innovative deep learning framework incorporating multi-feature representation and the information bottleneck principle (IBP) to predict peptide detectability. DeepPD extracts semantic information from peptides using evolutionary scale modeling 2 (ESM-2) and integrates sequence and evolutionary information to construct the feature space collaboratively. The IBP effectively guides the feature learning process, minimizing redundancy in the feature space. Experimental results across various datasets demonstrate that DeepPD outperforms state-of-the-art methods. Furthermore, we demonstrate that DeepPD exhibits competitive generalization and transfer learning capabilities across diverse datasets and species. In conclusion, DeepPD emerges as the most effective method for predicting peptide detectability, showcasing its potential applicability to other protein sequence prediction tasks.
Collapse
Affiliation(s)
- Fenglin Li
- College of Mathematics and System Science, Xinjiang University, Urumqi, 830046, China
| | - Yannan Bin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Information Materials and Intelligent Sensing Laboratory of Anhui Province, and School of Artificial Intelligence, Anhui University, Hefei, 230601, China
| | - Jianping Zhao
- College of Mathematics and System Science, Xinjiang University, Urumqi, 830046, China.
| | - Chunhou Zheng
- College of Mathematics and System Science, Xinjiang University, Urumqi, 830046, China.
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Information Materials and Intelligent Sensing Laboratory of Anhui Province, and School of Artificial Intelligence, Anhui University, Hefei, 230601, China.
| |
Collapse
|
4
|
Granata I, Maddalena L, Manzo M, Guarracino MR, Giordano M. HELP: A computational framework for labelling and predicting human common and context-specific essential genes. PLoS Comput Biol 2024; 20:e1012076. [PMID: 39331694 PMCID: PMC11463781 DOI: 10.1371/journal.pcbi.1012076] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2024] [Revised: 10/09/2024] [Accepted: 08/19/2024] [Indexed: 09/29/2024] Open
Abstract
Machine learning-based approaches are particularly suitable for identifying essential genes as they allow the generation of predictive models trained on features from multi-source data. Gene essentiality is neither binary nor static but determined by the context. The databases for essential gene annotation do not permit the personalisation of the context, and their update can be slower than the publication of new experimental data. We propose HELP (Human Gene Essentiality Labelling & Prediction), a computational framework for labelling and predicting essential genes. Its double scope allows for identifying genes based on dependency or not on experimental data. The effectiveness of the labelling method was demonstrated by comparing it with other approaches in overlapping the reference sets of essential gene annotations, where HELP demonstrated the best compromise between false and true positive rates. The gene attributes, including multi-omics and network embedding features, lead to high-performance prediction of essential genes while confirming the existence of essentiality nuances.
Collapse
Affiliation(s)
- Ilaria Granata
- Institute for High-Performance Computing and Networking, National Research Council, Naples, Italy
| | - Lucia Maddalena
- Institute for High-Performance Computing and Networking, National Research Council, Naples, Italy
| | - Mario Manzo
- Information Technology Services, University of Naples “L’Orientale”, Naples, Italy
| | - Mario Rosario Guarracino
- Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics, Nizhny Novgorod, Russia
- Department of Economics and Law, University of Cassino and Southern Lazio, Cassino, Frosinone, Italy
| | - Maurizio Giordano
- Institute for High-Performance Computing and Networking, National Research Council, Naples, Italy
| |
Collapse
|
5
|
Hu W, Li M, Xiao H, Guan L. Essential genes identification model based on sequence feature map and graph convolutional neural network. BMC Genomics 2024; 25:47. [PMID: 38200437 PMCID: PMC10777564 DOI: 10.1186/s12864-024-09958-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2023] [Accepted: 01/01/2024] [Indexed: 01/12/2024] Open
Abstract
BACKGROUND Essential genes encode functions that play a vital role in the life activities of organisms, encompassing growth, development, immune system functioning, and cell structure maintenance. Conventional experimental techniques for identifying essential genes are resource-intensive and time-consuming, and the accuracy of current machine learning models needs further enhancement. Therefore, it is crucial to develop a robust computational model to accurately predict essential genes. RESULTS In this study, we introduce GCNN-SFM, a computational model for identifying essential genes in organisms, based on graph convolutional neural networks (GCNN). GCNN-SFM integrates a graph convolutional layer, a convolutional layer, and a fully connected layer to model and extract features from gene sequences of essential genes. Initially, the gene sequence is transformed into a feature map using coding techniques. Subsequently, a multi-layer GCN is employed to perform graph convolution operations, effectively capturing both local and global features of the gene sequence. Further feature extraction is performed, followed by integrating convolution and fully-connected layers to generate prediction results for essential genes. The gradient descent algorithm is utilized to iteratively update the cross-entropy loss function, thereby enhancing the accuracy of the prediction results. Meanwhile, model parameters are tuned to determine the optimal parameter combination that yields the best prediction performance during training. CONCLUSIONS Experimental evaluation demonstrates that GCNN-SFM surpasses various advanced essential gene prediction models and achieves an average accuracy of 94.53%. This study presents a novel and effective approach for identifying essential genes, which has significant implications for biology and genomics research.
Collapse
Affiliation(s)
- Wenxing Hu
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, 341000, China
| | - Mengshan Li
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, 341000, China.
| | - Haiyang Xiao
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, 341000, China
| | - Lixin Guan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, 341000, China
| |
Collapse
|
6
|
Shim J, Koo J, Park Y. A Methodology of Condition Monitoring System Utilizing Supervised and Semi-Supervised Learning in Railway. SENSORS (BASEL, SWITZERLAND) 2023; 23:9075. [PMID: 38005464 PMCID: PMC10674533 DOI: 10.3390/s23229075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/08/2023] [Accepted: 10/28/2023] [Indexed: 11/26/2023]
Abstract
In this paper, research was conducted on anomaly detection of wheel flats. In the railway sector, conducting tests with actual railway vehicles is challenging due to safety concerns for passengers and maintenance issues as it is a public industry. Therefore, dynamics software was utilized. Next, STFT (short-time Fourier transform) was performed to create spectrogram images. In the case of railway vehicles, control, monitoring, and communication are performed through TCMS, but complex analysis and data processing are difficult because there are no devices such as GPUs. Furthermore, there are memory limitations. Therefore, in this paper, the relatively lightweight models LeNet-5, ResNet-20, and MobileNet-V3 were selected for deep learning experiments. At this time, the LeNet-5 and MobileNet-V3 models were modified from the basic architecture. Since railway vehicles are given preventive maintenance, it is difficult to obtain fault data. Therefore, semi-supervised learning was also performed. At this time, the Deep One Class Classification paper was referenced. The evaluation results indicated that the modified LeNet-5 and MobileNet-V3 models achieved approximately 97% and 96% accuracy, respectively. At this point, the LeNet-5 model showed a training time of 12 min faster than the MobileNet-V3 model. In addition, the semi-supervised learning results showed a significant outcome of approximately 94% accuracy when considering the railway maintenance environment. In conclusion, considering the railway vehicle maintenance environment and device specifications, it was inferred that the relatively simple and lightweight LeNet-5 model can be effectively utilized while using small images.
Collapse
Affiliation(s)
- Jaeseok Shim
- Complex Research Center for Materials & Components of Railway, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea;
| | - Jeongseo Koo
- Department of Railway Safety Engineering, Seoul National University of Science and Technology, Seoul 01811, Republic of Korea;
| | - Yongwoon Park
- A2Mind, 213, Toegye-ro, Jung-gu, Seoul 04557, Republic of Korea
| |
Collapse
|
7
|
Liu X, Teng L, Luo Y, Xu Y. Prediction of prokaryotic and eukaryotic promoters based on information-theoretic features. Biosystems 2023; 231:104979. [PMID: 37423595 DOI: 10.1016/j.biosystems.2023.104979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 07/06/2023] [Accepted: 07/07/2023] [Indexed: 07/11/2023]
Abstract
Promoters are DNA regulatory elements located near the transcription start site and are responsible for regulating the transcription of genes. DNA fragments arranged in a certain order form specific functional regions with different information contents. Information theory is the science that studies the extraction, measurement and transmission of information. The genetic information contained in DNA follows the general laws of information storage. Therefore, method in information theory can be used for the analysis of promoters carrying genetic information. In this study, we introduced the concept of information theory to the study of promoter prediction. We used 107 features extracted based on information theory methods and a backpropagation neural network to build a classifier. Then, the trained classifier was applied to predict the promoters of 6 organisms. The average AUCs of the 6 organisms obtained by using hold-out validation and ten-fold cross-validation were 0.885 and 0.886, respectively. The results verified the effectiveness of information-theoretic features in promoter prediction. Considering the possible redundancy in the feature set, we performed feature selection and obtained key feature subsets related to promoter characteristics. The results indicate the potential utility of information-theoretic features in promoter prediction.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China.
| | - Li Teng
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| | - Yachuan Luo
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| | - Yuqiao Xu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing, 400044, China
| |
Collapse
|
8
|
Rescifina A. Progress of the "Molecular Informatics" Section in 2022. Int J Mol Sci 2023; 24:ijms24119442. [PMID: 37298393 DOI: 10.3390/ijms24119442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2023] [Accepted: 05/19/2023] [Indexed: 06/12/2023] Open
Abstract
This is the first Editorial of the "Molecular Informatics" Section (MIS) of the International Journal of Molecular Sciences (IJMS), which was created towards the end of 2018 (the first article was submitted on 27 September 2018) and has experienced significant growth from 2018 to now [...].
Collapse
Affiliation(s)
- Antonio Rescifina
- Department of Drug and Health Sciences, University of Catania, Viale Andrea Doria 6, 95125 Catania, Italy
| |
Collapse
|
9
|
Kha QH, Le VH, Hung TNK, Nguyen NTK, Le NQK. Development and Validation of an Explainable Machine Learning-Based Prediction Model for Drug-Food Interactions from Chemical Structures. SENSORS (BASEL, SWITZERLAND) 2023; 23:3962. [PMID: 37112302 PMCID: PMC10143839 DOI: 10.3390/s23083962] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2023] [Revised: 03/26/2023] [Accepted: 04/12/2023] [Indexed: 06/19/2023]
Abstract
Possible drug-food constituent interactions (DFIs) could change the intended efficiency of particular therapeutics in medical practice. The increasing number of multiple-drug prescriptions leads to the rise of drug-drug interactions (DDIs) and DFIs. These adverse interactions lead to other implications, e.g., the decline in medicament's effect, the withdrawals of various medications, and harmful impacts on the patients' health. However, the importance of DFIs remains underestimated, as the number of studies on these topics is constrained. Recently, scientists have applied artificial intelligence-based models to study DFIs. However, there were still some limitations in data mining, input, and detailed annotations. This study proposed a novel prediction model to address the limitations of previous studies. In detail, we extracted 70,477 food compounds from the FooDB database and 13,580 drugs from the DrugBank database. We extracted 3780 features from each drug-food compound pair. The optimal model was eXtreme Gradient Boosting (XGBoost). We also validated the performance of our model on one external test set from a previous study which contained 1922 DFIs. Finally, we applied our model to recommend whether a drug should or should not be taken with some food compounds based on their interactions. The model can provide highly accurate and clinically relevant recommendations, especially for DFIs that may cause severe adverse events and even death. Our proposed model can contribute to developing more robust predictive models to help patients, under the supervision and consultants of physicians, avoid DFI adverse effects in combining drugs and foods for therapy.
Collapse
Affiliation(s)
- Quang-Hien Kha
- International Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan
- AIBioMed Research Group, Taipei Medical University, Taipei 110, Taiwan
| | - Viet-Huan Le
- International Ph.D. Program in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan
- AIBioMed Research Group, Taipei Medical University, Taipei 110, Taiwan
- Department of Thoracic Surgery, Khanh Hoa General Hospital, Nha Trang City 65000, Vietnam
| | | | - Ngan Thi Kim Nguyen
- Undergraduate Program of Nutrition Science, National Taiwan Normal University, Taipei 106, Taiwan
| | - Nguyen Quoc Khanh Le
- AIBioMed Research Group, Taipei Medical University, Taipei 110, Taiwan
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei 110, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 110, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
| |
Collapse
|
10
|
A General Hybrid Modeling Framework for Systems Biology Applications: Combining Mechanistic Knowledge with Deep Neural Networks under the SBML Standard. AI 2023. [DOI: 10.3390/ai4010014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/06/2023] Open
Abstract
In this paper, a computational framework is proposed that merges mechanistic modeling with deep neural networks obeying the Systems Biology Markup Language (SBML) standard. Over the last 20 years, the systems biology community has developed a large number of mechanistic models that are currently stored in public databases in SBML. With the proposed framework, existing SBML models may be redesigned into hybrid systems through the incorporation of deep neural networks into the model core, using a freely available python tool. The so-formed hybrid mechanistic/neural network models are trained with a deep learning algorithm based on the adaptive moment estimation method (ADAM), stochastic regularization and semidirect sensitivity equations. The trained hybrid models are encoded in SBML and uploaded in model databases, where they may be further analyzed as regular SBML models. This approach is illustrated with three well-known case studies: the Escherichia coli threonine synthesis model, the P58IPK signal transduction model, and the Yeast glycolytic oscillations model. The proposed framework is expected to greatly facilitate the widespread use of hybrid modeling techniques for systems biology applications.
Collapse
|
11
|
Rout RK, Umer S, Khandelwal M, Pati S, Mallik S, Balabantaray BK, Qin H. Identification of discriminant features from stationary pattern of nucleotide bases and their application to essential gene classification. Front Genet 2023; 14:1154120. [PMID: 37152988 PMCID: PMC10156977 DOI: 10.3389/fgene.2023.1154120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Accepted: 04/04/2023] [Indexed: 05/09/2023] Open
Abstract
Introduction: Essential genes are essential for the survival of various species. These genes are a family linked to critical cellular activities for species survival. These genes are coded for proteins that regulate central metabolism, gene translation, deoxyribonucleic acid replication, and fundamental cellular structure and facilitate intracellular and extracellular transport. Essential genes preserve crucial genomics information that may hold the key to a detailed knowledge of life and evolution. Essential gene studies have long been regarded as a vital topic in computational biology due to their relevance. An essential gene is composed of adenine, guanine, cytosine, and thymine and its various combinations. Methods: This paper presents a novel method of extracting information on the stationary patterns of nucleotides such as adenine, guanine, cytosine, and thymine in each gene. For this purpose, some co-occurrence matrices are derived that provide the statistical distribution of stationary patterns of nucleotides in the genes, which is helpful in establishing the relationship between the nucleotides. For extracting discriminant features from each co-occurrence matrix, energy, entropy, homogeneity, contrast, and dissimilarity features are computed, which are extracted from all co-occurrence matrices and then concatenated to form a feature vector representing each essential gene. Finally, supervised machine learning algorithms are applied for essential gene classification based on the extracted fixed-dimensional feature vectors. Results: For comparison, some existing state-of-the-art feature representation techniques such as Shannon entropy (SE), Hurst exponent (HE), fractal dimension (FD), and their combinations have been utilized. Discussion: An extensive experiment has been performed for classifying the essential genes of five species that show the robustness and effectiveness of the proposed methodology.
Collapse
Affiliation(s)
- Ranjeet Kumar Rout
- National Institute of Technology Srinagar, Hazratbal, Jammu and Kashmir, India
| | - Saiyed Umer
- Aliah University, Kolkata, West Bengal, India
| | - Monika Khandelwal
- National Institute of Technology Srinagar, Hazratbal, Jammu and Kashmir, India
| | - Smitarani Pati
- Dr. B R Ambedkar National Institute of Technology Jalandhar, Jalandhar, Punjab, India
| | - Saurav Mallik
- Harvard T H Chan School of Public Health, Boston, United States
- Department of Pharmacology and Toxicology, University of Arizona, Tucson, AZ, United States
- *Correspondence: Saurav Mallik, , ; Hong Qin,
| | | | - Hong Qin
- Department of Computer Science and Engineering, University of Tennessee at Chattanooga, Chattanooga, TN, United States
- *Correspondence: Saurav Mallik, , ; Hong Qin,
| |
Collapse
|
12
|
Tsimenidis S, Vrochidou E, Papakostas GA. Omics Data and Data Representations for Deep Learning-Based Predictive Modeling. Int J Mol Sci 2022; 23:12272. [PMID: 36293133 PMCID: PMC9603455 DOI: 10.3390/ijms232012272] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Revised: 10/03/2022] [Accepted: 10/12/2022] [Indexed: 11/25/2022] Open
Abstract
Medical discoveries mainly depend on the capability to process and analyze biological datasets, which inundate the scientific community and are still expanding as the cost of next-generation sequencing technologies is decreasing. Deep learning (DL) is a viable method to exploit this massive data stream since it has advanced quickly with there being successive innovations. However, an obstacle to scientific progress emerges: the difficulty of applying DL to biology, and this because both fields are evolving at a breakneck pace, thus making it hard for an individual to occupy the front lines of both of them. This paper aims to bridge the gap and help computer scientists bring their valuable expertise into the life sciences. This work provides an overview of the most common types of biological data and data representations that are used to train DL models, with additional information on the models themselves and the various tasks that are being tackled. This is the essential information a DL expert with no background in biology needs in order to participate in DL-based research projects in biomedicine, biotechnology, and drug discovery. Alternatively, this study could be also useful to researchers in biology to understand and utilize the power of DL to gain better insights into and extract important information from the omics data.
Collapse
Affiliation(s)
| | | | - George A. Papakostas
- MLV Research Group, Department of Computer Science, International Hellenic University, 65404 Kavala, Greece
| |
Collapse
|
13
|
Kang M, Oh JH. Editorial of Special Issue "Deep Learning and Machine Learning in Bioinformatics". Int J Mol Sci 2022; 23:ijms23126610. [PMID: 35743052 PMCID: PMC9224509 DOI: 10.3390/ijms23126610] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Accepted: 06/10/2022] [Indexed: 02/04/2023] Open
Abstract
In recent years, deep learning has emerged as a highly active research field, achieving great success in various machine learning areas, including image processing, speech recognition, and natural language processing, and now rapidly becoming a dominant tool in biomedicine [...].
Collapse
Affiliation(s)
- Mingon Kang
- Department of Computer Science, University of Nevada, Las Vegas, NV 89154, USA;
| | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA
- Correspondence:
| |
Collapse
|
14
|
Prediction of Intrinsically Disordered Proteins Using Machine Learning Based on Low Complexity Methods. ALGORITHMS 2022. [DOI: 10.3390/a15030086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Prediction of intrinsic disordered proteins is a hot area in the field of bio-information. Due to the high cost of evaluating the disordered regions of protein sequences using experimental methods, we used a low-complexity prediction scheme. Sequence complexity is used in this scheme to calculate five features for each residue of the protein sequence, including the Shannon entropy, the Topo-logical entropy, the Permutation entropy and the weighted average values of two propensities. Particularly, this is the first time that permutation entropy has been applied to the field of protein sequencing. In addition, in the data preprocessing stage, an appropriately sized sliding window and a comprehensive oversampling scheme can be used to improve the prediction performance of our scheme, and two ensemble learning algorithms are also used to verify the prediction results before and after. The results show that adding permutation entropy improves the performance of the prediction algorithm, in which the MCC value can be improved from the original 0.465 to 0.526 in our scheme, proving its universality. Finally, we compare the simulation results of our scheme with those of some existing schemes to demonstrate its effectiveness.
Collapse
|
15
|
Qian X, Zheng H, Xue K, Chen Z, Hu Z, Zhang L, Wan J. Recurrence Risk of Liver Cancer Post-hepatectomy Using Machine Learning and Study of Correlation With Immune Infiltration. Front Genet 2021; 12:733654. [PMID: 34956309 PMCID: PMC8692778 DOI: 10.3389/fgene.2021.733654] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Accepted: 11/24/2021] [Indexed: 12/24/2022] Open
Abstract
Postoperative recurrence of liver cancer is the main obstacle to improving the survival rate of patients with liver cancer. We established an mRNA-based model to predict the risk of recurrence after hepatectomy for liver cancer and explored the relationship between immune infiltration and the risk of recurrence after hepatectomy for liver cancer. We performed a series of bioinformatics analyses on the gene expression profiles of patients with liver cancer, and selected 18 mRNAs as biomarkers for predicting the risk of recurrence of liver cancer using a machine learning method. At the same time, we evaluated the immune infiltration of the samples and conducted a joint analysis of the recurrence risk of liver cancer and found that B cell, B cell naive, T cell CD4+ memory resting, and T cell CD4+ were significantly correlated with the risk of postoperative recurrence of liver cancer. These results are helpful for early detection, intervention, and the individualized treatment of patients with liver cancer after surgical resection, and help to reveal the potential mechanism of liver cancer recurrence.
Collapse
Affiliation(s)
- Xiaowen Qian
- Department of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Huilin Zheng
- Department of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Ke Xue
- Department of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Zheng Chen
- Division of Hepatobiliary and Pancreatic Surgery, Department of Surgery, Fourth Affiliated Hospital, School of Medicine, Zhejiang University, Yiwu, China
| | - Zhenhua Hu
- Division of Hepatobiliary and Pancreatic Surgery, Department of Surgery, Fourth Affiliated Hospital, School of Medicine, Zhejiang University, Yiwu, China.,Key Laboratory of Combined Multi-Organ Transplantation, Division of Hepatobiliary and Pancreatic Surgery, Department of Surgery, First Affiliated Hospital, School of Medicine, Zhejiang University, Ministry of Public Health Key Laboratory of Organ Transplantation, Hangzhou, China.,Division of Hepatobiliary and Pancreatic Surgery, Yiwu Central Hospital, Yiwu, China
| | - Lei Zhang
- Department of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China.,Department of Biological and Chemical Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| | - Jian Wan
- Department of Information and Electronic Engineering, Zhejiang University of Science and Technology, Hangzhou, China
| |
Collapse
|
16
|
Prediction of Peptide Detectability Based on CapsNet and Convolutional Block Attention Module. Int J Mol Sci 2021; 22:ijms222112080. [PMID: 34769509 PMCID: PMC8584443 DOI: 10.3390/ijms222112080] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 10/30/2021] [Accepted: 11/02/2021] [Indexed: 11/17/2022] Open
Abstract
According to proteomics technology, as impacted by the complexity of sampling in the experimental process, several problems remain with the reproducibility of mass spectrometry experiments, and the peptide identification and quantitative results continue to be random. Predicting the detectability exhibited by peptides can optimize the mentioned results to be more accurate, so such a prediction is of high research significance. This study builds a novel method to predict the detectability of peptides by complying with the capsule network (CapsNet) and the convolutional block attention module (CBAM). First, the residue conical coordinate (RCC), the amino acid composition (AAC), the dipeptide composition (DPC), and the sequence embedding code (SEC) are extracted as the peptide chain features. Subsequently, these features are divided into the biological feature and sequence feature, and separately inputted into the neural network of CapsNet. Moreover, the attention module CBAM is added to the network to assign weights to channels and spaces, as an attempt to enhance the feature learning and improve the network training effect. To verify the effectiveness of the proposed method, it is compared with some other popular methods. As revealed from the experimentally achieved results, the proposed method outperforms those methods in most performance assessments.
Collapse
|
17
|
Xu Z, Luo M, Lin W, Xue G, Wang P, Jin X, Xu C, Zhou W, Cai Y, Yang W, Nie H, Jiang Q. DLpTCR: an ensemble deep learning framework for predicting immunogenic peptide recognized by T cell receptor. Brief Bioinform 2021; 22:6355415. [PMID: 34415016 DOI: 10.1093/bib/bbab335] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 07/25/2021] [Accepted: 07/28/2021] [Indexed: 12/30/2022] Open
Abstract
Accurate prediction of immunogenic peptide recognized by T cell receptor (TCR) can greatly benefit vaccine development and cancer immunotherapy. However, identifying immunogenic peptides accurately is still a huge challenge. Most of the antigen peptides predicted in silico fail to elicit immune responses in vivo without considering TCR as a key factor. This inevitably causes costly and time-consuming experimental validation test for predicted antigens. Therefore, it is necessary to develop novel computational methods for precisely and effectively predicting immunogenic peptide recognized by TCR. Here, we described DLpTCR, a multimodal ensemble deep learning framework for predicting the likelihood of interaction between single/paired chain(s) of TCR and peptide presented by major histocompatibility complex molecules. To investigate the generality and robustness of the proposed model, COVID-19 data and IEDB data were constructed for independent evaluation. The DLpTCR model exhibited high predictive power with area under the curve up to 0.91 on COVID-19 data while predicting the interaction between peptide and single TCR chain. Additionally, the DLpTCR model achieved the overall accuracy of 81.03% on IEDB data while predicting the interaction between peptide and paired TCR chains. The results demonstrate that DLpTCR has the ability to learn general interaction rules and generalize to antigen peptide recognition by TCR. A user-friendly webserver is available at http://jianglab.org.cn/DLpTCR/. Additionally, a stand-alone software package that can be downloaded from https://github.com/jiangBiolab/DLpTCR.
Collapse
Affiliation(s)
- Zhaochun Xu
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Meng Luo
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Weizhong Lin
- Center for Bioinformatics, Computer Department, Jingdezhen Ceramic Institute, Jingdezhen 333403, China
| | - Guangfu Xue
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Pingping Wang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Xiyun Jin
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Chang Xu
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Wenyang Zhou
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Yideng Cai
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Wenyi Yang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Huan Nie
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China
| | - Qinghua Jiang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China.,Key Laboratory of Biological Data (Harbin Institute of Technology), Ministry of Education, China
| |
Collapse
|
18
|
Cheng Y, Chen C, Yang J, Yang H, Fu M, Zhong X, Wang B, He M, Hu Z, Zhang Z, Jin X, Kang Y, Wu Q. Using Machine Learning Algorithms to Predict Hospital Acquired Thrombocytopenia after Operation in the Intensive Care Unit: A Retrospective Cohort Study. Diagnostics (Basel) 2021; 11:diagnostics11091614. [PMID: 34573956 PMCID: PMC8466367 DOI: 10.3390/diagnostics11091614] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/25/2021] [Accepted: 09/01/2021] [Indexed: 02/05/2023] Open
Abstract
Hospital acquired thrombocytopenia (HAT) is a common hematological complication after surgery. This research aimed to develop and compare the performance of seven machine learning (ML) algorithms for predicting patients that are at risk of HAT after surgery. We conducted a retrospective cohort study which enrolled adult patients transferred to the intensive care unit (ICU) after surgery in West China Hospital of Sichuan University from January 2016 to December 2018. All subjects were randomly divided into a derivation set (70%) and test set (30%). ten-fold cross-validation was used to estimate the hyperparameters of ML algorithms during the training process in the derivation set. After ML models were developed, the sensitivity, specificity, area under the curve (AUC), and net benefit (decision analysis curve, DCA) were calculated to evaluate the performances of ML models in the test set. A total of 10,369 patients were included and in 1354 (13.1%) HAT occurred. The AUC of all seven ML models exceeded 0.7, the two highest were Gradient Boosting (GB) (0.834, 0.814-0.853, p < 0.001) and Random Forest (RF) (0.828, 0.807-0.848, p < 0.001). There was no difference between GB and RF (0.834 vs. 0.828, p = 0.293); however, these two were better than the remaining five models (p < 0.001). The DCA revealed that all ML models had high net benefits with a threshold probability approximately less than 0.6. In conclusion, we found that ML models constructed by multiple preoperative variables can predict HAT in patients transferred to ICU after surgery, which can improve risk stratification and guide management in clinical practice.
Collapse
Affiliation(s)
- Yisong Cheng
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Chaoyue Chen
- Department of Neurosurgery, West China Hospital, Sichuan University, Chengdu 610041, China;
| | - Jie Yang
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Hao Yang
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Min Fu
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Xi Zhong
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Bo Wang
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Min He
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Zhi Hu
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Zhongwei Zhang
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Xiaodong Jin
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Yan Kang
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
| | - Qin Wu
- Department of Critical Care Medicine, West China Hospital, Sichuan University, Chengdu 610041, China; (Y.C.); (J.Y.); (H.Y.); (M.F.); (X.Z.); (B.W.); (M.H.); (Z.H.); (Z.Z.); (X.J.); (Y.K.)
- Correspondence: ; Tel.: +86-028-8542-2506
| |
Collapse
|
19
|
Pérez-Reynoso FD, Rodríguez-Guerrero L, Salgado-Ramírez JC, Ortega-Palacios R. Human-Machine Interface: Multiclass Classification by Machine Learning on 1D EOG Signals for the Control of an Omnidirectional Robot. SENSORS (BASEL, SWITZERLAND) 2021; 21:5882. [PMID: 34502773 PMCID: PMC8434373 DOI: 10.3390/s21175882] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Revised: 08/24/2021] [Accepted: 08/26/2021] [Indexed: 01/25/2023]
Abstract
People with severe disabilities require assistance to perform their routine activities; a Human-Machine Interface (HMI) will allow them to activate devices that respond according to their needs. In this work, an HMI based on electrooculography (EOG) is presented, the instrumentation is placed on portable glasses that have the task of acquiring both horizontal and vertical EOG signals. The registration of each eye movement is identified by a class and categorized using the one hot encoding technique to test precision and sensitivity of different machine learning classification algorithms capable of identifying new data from the eye registration; the algorithm allows to discriminate blinks in order not to disturb the acquisition of the eyeball position commands. The implementation of the classifier consists of the control of a three-wheeled omnidirectional robot to validate the response of the interface. This work proposes the classification of signals in real time and the customization of the interface, minimizing the user's learning curve. Preliminary results showed that it is possible to generate trajectories to control an omnidirectional robot to implement in the future assistance system to control position through gaze orientation.
Collapse
Affiliation(s)
| | - Liliam Rodríguez-Guerrero
- Research Center on Technology of Information and Systems (CITIS), Electric and Control Academic Group, Universidad Autónoma del Estado de Hidalgo (UAEH), Pachuca de Soto 42039, Mexico
| | | | - Rocío Ortega-Palacios
- Biomedical Engineering, Universidad Politécnica de Pachuca (UPP), Zempoala 43830, Mexico
| |
Collapse
|
20
|
Huang L, Lin L, Fu X, Meng C. Development and validation of a novel survival model for acute myeloid leukemia based on autophagy-related genes. PeerJ 2021; 9:e11968. [PMID: 34447636 PMCID: PMC8364747 DOI: 10.7717/peerj.11968] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 07/23/2021] [Indexed: 12/21/2022] Open
Abstract
Background Acute myeloid leukemia (AML) is one of the most common blood cancers, and is characterized by impaired hematopoietic function and bone marrow (BM) failure. Under normal circumstances, autophagy may suppress tumorigenesis, however under the stressful conditions of late stage tumor growth autophagy actually protects tumor cells, so inhibiting autophagy in these cases also inhibits tumor growth and promotes tumor cell death. Methods AML gene expression profile data and corresponding clinical data were obtained from the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases, from which prognostic-related genes were screened to construct a risk score model through LASSO and univariate and multivariate Cox analyses. Then the model was verified in the TCGA cohort and GEO cohorts. In addition, we also analyzed the relationship between autophagy genes and immune infiltrating cells and therapeutic drugs. Results We built a model containing 10 autophagy-related genes to predict the survival of AML patients by dividing them into high- or low-risk subgroups. The high-risk subgroup was prone to a poorer prognosis in both the training TCGA-LAML cohort and the validation GSE37642 cohort. Univariate and multivariate Cox analysis revealed that the risk score of the autophagy model can be used as an independent prognostic factor. The high-risk subgroup had not only higher fractions of CD4 naïve T cell, NK cell activated, and resting mast cells but also higher expression of immune checkpoint genes CTLA4 and CD274. Last, we screened drug sensitivity between high- and low-risk subgroups. Conclusion The risk score model based on 10 autophagy-related genes can serve as an effective prognostic predictor for AML patients and may guide for patient stratification for immunotherapies and drugs.
Collapse
Affiliation(s)
- Li Huang
- Department of Hematology, Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Haikou, China
| | - Lier Lin
- Department of Hematology, Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Haikou, China
| | - Xiangjun Fu
- Department of Hematology, Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Haikou, China
| | - Can Meng
- Department of Hematology, Hainan General Hospital (Hainan Affiliated Hospital of Hainan Medical University), Haikou, China
| |
Collapse
|
21
|
Choi Y, Aum J, Lee SH, Kim HK, Kim J, Shin S, Jeong JY, Ock CY, Lee HY. Deep Learning Analysis of CT Images Reveals High-Grade Pathological Features to Predict Survival in Lung Adenocarcinoma. Cancers (Basel) 2021; 13:4077. [PMID: 34439230 PMCID: PMC8391458 DOI: 10.3390/cancers13164077] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 08/02/2021] [Accepted: 08/09/2021] [Indexed: 01/18/2023] Open
Abstract
We aimed to develop a deep learning (DL) model for predicting high-grade patterns in lung adenocarcinomas (ADC) and to assess the prognostic performance of model in advanced lung cancer patients who underwent neoadjuvant or definitive concurrent chemoradiation therapy (CCRT). We included 275 patients with 290 early lung ADCs from an ongoing prospective clinical trial in the training dataset, which we split into internal-training and internal-validation datasets. We constructed a diagnostic DL model of high-grade patterns of lung ADC considering both morphologic view of the tumor and context view of the area surrounding the tumor (MC3DN; morphologic-view context-view 3D network). Validation was performed on an independent dataset of 417 patients with advanced non-small cell lung cancer who underwent neoadjuvant or definitive CCRT. The area under the curve value of the DL model was 0.8 for the prediction of high-grade histologic patterns such as micropapillary and solid patterns (MPSol). When our model was applied to the validation set, a high probability of MPSol was associated with worse overall survival (probability of MPSol >0.5 vs. <0.5; 5-year OS rate 56.1% vs. 70.7%), indicating that our model could predict the clinical outcomes of advanced lung cancer patients. The subgroup with a high probability of MPSol estimated by the DL model showed a 1.76-fold higher risk of death (HR 1.76, 95% CI 1.16-2.68). Our DL model can be useful in estimating high-grade histologic patterns in lung ADCs and predicting clinical outcomes of patients with advanced lung cancer who underwent neoadjuvant or definitive CCRT.
Collapse
Affiliation(s)
- Yeonu Choi
- Department of Radiology, Sungkyunkwan University School of Medicine (SKKU-SOM), Samsung Medical Center, Seoul 06351, Korea;
| | - Jaehong Aum
- Lunit Inc., Seoul 06241, Korea; (J.A.); (S.S.)
| | - Se-Hoon Lee
- Division of Hemato-Oncology, Department of Medicine, Sungkyunkwan University School of Medicine (SKKU-SOM), Samsung Medical Center, Seoul 06351, Korea;
| | - Hong-Kwan Kim
- Department of Thoracic Surgery, Sungkyunkwan University School of Medicine (SKKU-SOM), Samsung Medical Center, Seoul 06351, Korea; (H.-K.K.); (J.K.)
| | - Jhingook Kim
- Department of Thoracic Surgery, Sungkyunkwan University School of Medicine (SKKU-SOM), Samsung Medical Center, Seoul 06351, Korea; (H.-K.K.); (J.K.)
| | | | - Ji Yun Jeong
- Department of Pathology, Kyungpook National University School of Medicine, Kyungpook National University Chilgok Hospital, Daegu 41404, Korea;
| | | | - Ho Yun Lee
- Department of Radiology, Sungkyunkwan University School of Medicine (SKKU-SOM), Samsung Medical Center, Seoul 06351, Korea;
| |
Collapse
|
22
|
Ahn H, Yeo I. Deep-Learning-Based Approach to Anomaly Detection Techniques for Large Acoustic Data in Machine Operation. SENSORS (BASEL, SWITZERLAND) 2021; 21:5446. [PMID: 34450888 PMCID: PMC8400866 DOI: 10.3390/s21165446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 08/09/2021] [Accepted: 08/11/2021] [Indexed: 11/24/2022]
Abstract
As the workforce shrinks, the demand for automatic, labor-saving, anomaly detection technology that can perform maintenance on advanced equipment such as vehicles has been increasing. In a vehicular environment, noise in the cabin, which directly affects users, is considered an important factor in lowering the emotional satisfaction of the driver and/or passengers in the vehicles. In this study, we provide an efficient method that can collect acoustic data, measured using a large number of microphones, in order to detect abnormal operations inside the machine via deep learning in a quick and highly accurate manner. Unlike most current approaches based on Long Short-Term Memory (LSTM) or autoencoders, we propose an anomaly detection (AD) algorithm that can overcome the limitations of noisy measurement and detection system anomalies via noise signals measured inside the mechanical system. These features are utilized to train a variety of anomaly detection models for demonstration in noisy environments with five different errors in machine operation, achieving an accuracy of approximately 90% or more.
Collapse
Affiliation(s)
- Hyojung Ahn
- Korea Aerospace Research Institute, Daejeon 34133, Korea
| | - Inchoon Yeo
- Fourgoodcompany Co., Ltd., Sejong 30130, Korea
| |
Collapse
|
23
|
Ji C, Liu Z, Wang Y, Ni J, Zheng C. GATNNCDA: A Method Based on Graph Attention Network and Multi-Layer Neural Network for Predicting circRNA-Disease Associations. Int J Mol Sci 2021; 22:8505. [PMID: 34445212 PMCID: PMC8395191 DOI: 10.3390/ijms22168505] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 07/30/2021] [Accepted: 08/03/2021] [Indexed: 12/30/2022] Open
Abstract
Circular RNAs (circRNAs) are a new class of endogenous non-coding RNAs with covalent closed loop structure. Researchers have revealed that circRNAs play an important role in human diseases. As experimental identification of interactions between circRNA and disease is time-consuming and expensive, effective computational methods are an urgent need for predicting potential circRNA-disease associations. In this study, we proposed a novel computational method named GATNNCDA, which combines Graph Attention Network (GAT) and multi-layer neural network (NN) to infer disease-related circRNAs. Specially, GATNNCDA first integrates disease semantic similarity, circRNA functional similarity and the respective Gaussian Interaction Profile (GIP) kernel similarities. The integrated similarities are used as initial node features, and then GAT is applied for further feature extraction in the heterogeneous circRNA-disease graph. Finally, the NN-based classifier is introduced for prediction. The results of fivefold cross validation demonstrated that GATNNCDA achieved an average AUC of 0.9613 and AUPR of 0.9433 on the CircR2Disease dataset, and outperformed other state-of-the-art methods. In addition, case studies on breast cancer and hepatocellular carcinoma showed that 20 and 18 of the top 20 candidates were respectively confirmed in the validation datasets or published literature. Therefore, GATNNCDA is an effective and reliable tool for discovering circRNA-disease associations.
Collapse
Affiliation(s)
- Cunmei Ji
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Z.L.); (Y.W.); (J.N.)
| | - Zhihao Liu
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Z.L.); (Y.W.); (J.N.)
| | - Yutian Wang
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Z.L.); (Y.W.); (J.N.)
| | - Jiancheng Ni
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Z.L.); (Y.W.); (J.N.)
| | - Chunhou Zheng
- School of Artificial Intelligence, Anhui University, Hefei 230601, China
| |
Collapse
|
24
|
Liu X, Luo Y, He T, Ren M, Xu Y. Predicting essential genes of 37 prokaryotes by combining information-theoretic features. J Microbiol Methods 2021; 188:106297. [PMID: 34343487 DOI: 10.1016/j.mimet.2021.106297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2021] [Revised: 07/30/2021] [Accepted: 07/30/2021] [Indexed: 10/20/2022]
Abstract
Essential genes are required for the reproduction and survival of an organism. Rapid identification of essential genes has practical application value in biomedicine. Information theory is a discipline that studies information transmission. Based on the similarity between heredity and information transmission, measures derived from information theory can be applied to genetic sequence analysis on different scales. In this study, we employed 114 features extracted by information theory methods to construct an essential gene prediction model. We applied a backpropagation neural network to construct a classifier and employed it to predict essential genes of 37 prokaryotes. The performance of the classifier was evaluated by applying intra-organism prediction and leave-one-species-out prediction. Among 37 prokaryotes, intra-organism prediction and leave-one-species-out prediction yielded average AUC scores of 0.791 and 0.717, respectively. Considering the potential redundancy in the feature set, we performed feature selection and constructed a key feature subset. In the above two prediction methods, the average AUC scores of 37 organisms obtained by using key features were 0.786 and 0.714, respectively. The results show the potential and universality of information-theoretic features in the study of prokaryotic essential gene prediction.
Collapse
Affiliation(s)
- Xiao Liu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China.
| | - Yachuan Luo
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Ting He
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Meixiang Ren
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| | - Yuqiao Xu
- School of Microelectronics and Communication Engineering, Chongqing University, 174 ShaPingBa District, Chongqing 400044, China
| |
Collapse
|
25
|
Chen C, Shi H, Jiang Z, Salhi A, Chen R, Cui X, Yu B. DNN-DTIs: Improved drug-target interactions prediction using XGBoost feature selection and deep neural network. Comput Biol Med 2021; 136:104676. [PMID: 34375902 DOI: 10.1016/j.compbiomed.2021.104676] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 07/18/2021] [Accepted: 07/19/2021] [Indexed: 02/03/2023]
Abstract
Analysis and prediction of drug-target interactions (DTIs) play an important role in understanding drug mechanisms, as well as drug repositioning and design. Machine learning (ML)-based methods for DTIs prediction can mitigate the shortcomings of time-consuming and labor-intensive experimental approaches, while providing new ideas and insights for drug design. We propose a novel pipeline for predicting drug-target interactions, called DNN-DTIs. First, the target information is characterized by a number of features, namely, pseudo-amino acid composition, pseudo position-specific scoring matrix, conjoint triad composition, transition and distribution, Moreau-Broto autocorrelation, and structural features. The drug compounds are subsequently encoded using substructure fingerprints. Next, eXtreme gradient boosting (XGBoost) is used to determine the subset of non-redundant features of importance. The optimal balanced set of sample vectors is obtained by applying the synthetic minority oversampling technique (SMOTE). Finally, a DTIs predictor, DNN-DTIs, is developed based on a deep neural network (DNN) via a layer-by-layer learning scheme. Experimental results indicate that DNN-DTIs achieves better performance than other state-of-the-art predictors with ACC values of 98.78%, 98.60%, 97.98%, 98.24% and 98.00% on Enzyme, Ion Channels (IC), GPCR, Nuclear Receptors (NR) and Kuang's datasets. Therefore, the accurate prediction performance of DNN-DTIs makes it a favored choice for contributing to the study of DTIs, especially drug repositioning.
Collapse
Affiliation(s)
- Cheng Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Computer Science and Technology, Shandong University, Qingdao, 266237, China
| | - Han Shi
- Key Laboratory of Synthetic Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Zhiwen Jiang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Adil Salhi
- Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Ruixin Chen
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Xuefeng Cui
- School of Computer Science and Technology, Shandong University, Qingdao, 266237, China
| | - Bin Yu
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; Artificial Intelligence and Biomedical Big Data Research Center, Qingdao University of Science and Technology, Qingdao, 266061, China; Key Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, China.
| |
Collapse
|
26
|
Prediction of African Swine Fever Virus Inhibitors by Molecular Docking-Driven Machine Learning Models. Molecules 2021; 26:molecules26123592. [PMID: 34208385 PMCID: PMC8231271 DOI: 10.3390/molecules26123592] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Revised: 05/23/2021] [Accepted: 06/09/2021] [Indexed: 01/31/2023] Open
Abstract
African swine fever virus (ASFV) causes a highly contagious and severe hemorrhagic viral disease with high mortality in domestic pigs of all ages. Although the virus is harmless to humans, the ongoing ASFV epidemic could have severe economic consequences for global food security. Recent studies have found a few antiviral agents that can inhibit ASFV infections. However, currently, there are no vaccines or antiviral drugs. Hence, there is an urgent need to identify new drugs to treat ASFV. Based on the structural information data on the targets of ASFV, we used molecular docking and machine learning models to identify novel antiviral agents. We confirmed that compounds with high affinity present in the region of interest belonged to subsets in the chemical space using principal component analysis and k-means clustering in molecular docking studies of FDA-approved drugs. These methods predicted pentagastrin as a potential antiviral drug against ASFVs. Finally, it was also observed that the compound had an inhibitory effect on AsfvPolX activity. Results from the present study suggest that molecular docking and machine learning models can play an important role in identifying potential antiviral drugs against ASFVs.
Collapse
|
27
|
Iuchi H, Matsutani T, Yamada K, Iwano N, Sumi S, Hosoda S, Zhao S, Fukunaga T, Hamada M. Representation learning applications in biological sequence analysis. Comput Struct Biotechnol J 2021; 19:3198-3208. [PMID: 34141139 PMCID: PMC8190442 DOI: 10.1016/j.csbj.2021.05.039] [Citation(s) in RCA: 35] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/10/2021] [Accepted: 05/20/2021] [Indexed: 12/16/2022] Open
Abstract
Although remarkable advances have been reported in high-throughput sequencing, the ability to aptly analyze a substantial amount of rapidly generated biological (DNA/RNA/protein) sequencing data remains a critical hurdle. To tackle this issue, the application of natural language processing (NLP) to biological sequence analysis has received increased attention. In this method, biological sequences are regarded as sentences while the single nucleic acids/amino acids or k-mers in these sequences represent the words. Embedding is an essential step in NLP, which performs the conversion of these words into vectors. Specifically, representation learning is an approach used for this transformation process, which can be applied to biological sequences. Vectorized biological sequences can then be applied for function and structure estimation, or as input for other probabilistic models. Considering the importance and growing trend for the application of representation learning to biological research, in the present study, we have reviewed the existing knowledge in representation learning for biological sequence analysis.
Collapse
Affiliation(s)
- Hitoshi Iuchi
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
| | - Taro Matsutani
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Keisuke Yamada
- School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Natsuki Iwano
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Shunsuke Sumi
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- Department of Life Science Frontiers, Center for iPS Cell Research and Application, Kyoto University, Kyoto 606-8507, Japan
| | - Shion Hosoda
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Shitao Zhao
- Waseda Research Institute for Science and Engineering, Waseda University, Tokyo 169-8555, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Tokyo 169-0051, Japan
- Department of Computer Science, Graduate School of Information Science and Technology, The University of Tokyo, Tokyo 113-0032, Japan
| | - Michiaki Hamada
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo 169-8555, Japan
- Graduate School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- School of Advanced Science and Engineering, Waseda University, Tokyo 169-8555, Japan
- Graduate School of Medicine, Nippon Medical School, Tokyo 113-8602, Japan
| |
Collapse
|
28
|
Automatic Detection of Atrial Fibrillation in ECG Using Co-Occurrence Patterns of Dynamic Symbol Assignment and Machine Learning. SENSORS 2021; 21:s21103542. [PMID: 34069717 PMCID: PMC8161329 DOI: 10.3390/s21103542] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2021] [Revised: 05/04/2021] [Accepted: 05/07/2021] [Indexed: 11/20/2022]
Abstract
Early detection of atrial fibrillation from electrocardiography (ECG) plays a vital role in the timely prevention and diagnosis of cardiovascular diseases. Various algorithms have been proposed; however, they are lacking in considering varied-length signals, morphological transitions, and abnormalities over long-term recordings. We propose dynamic symbolic assignment (DSA) to differentiate a normal sinus rhythm (SR) from paroxysmal atrial fibrillation (PAF). We use ECG signals and their interbeat (RR) intervals from two public databases namely, AF Prediction Challenge Database (AFPDB) and AF Termination Challenge Database (AFTDB). We transform RR intervals into a symbolic representation and compute co-occurrence matrices. The DSA feature is extracted using varied symbol-length V, word-size W, and applied to five machine learning algorithms for classification. We test five hypotheses: (i) DSA captures the dynamics of the series, (ii) DSA is a reliable technique for various databases, (iii) optimal parameters improve DSA’s performance, (iv) DSA is consistent for variable signal lengths, and (v) DSA supports cross-data analysis. Our method captures the transition patterns of the RR intervals. The DSA feature exhibit a statistically significant difference in SR and PAF conditions (p < 0.005). The DSA feature with W=3 and V=3 yield maximum performance. In terms of F-measure (F), rotation forest and ensemble learning classifier are the most accurate for AFPDB (F = 94.6%) and AFTDB (F = 99.8%). Our method is effective for short-length signals and supports cross-data analysis. The DSA is capable of capturing the dynamics of varied-lengths ECG signals. Particularly, the optimal parameters-based DSA feature and ensemble learning could help to detect PAF in long-term ECG signals. Our method maps time series into a symbolic representation and identifies abnormalities in noisy, varied-length, and pathological ECG signals.
Collapse
|
29
|
Construction of a Soundscape-Based Media Art Exhibition to Improve User Appreciation Experience by Using Deep Neural Networks. ELECTRONICS 2021. [DOI: 10.3390/electronics10101170] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The objective of this study was to improve user experience when appreciating visual artworks with soundscape music chosen by a deep neural network based on weakly supervised learning. We also propose a multi-faceted approach to measuring ambiguous concepts, such as the subjective fitness, implicit senses, immersion, and availability. We showed improvements in appreciation experience, such as the metaphorical and psychological transferability, time distortion, and cognitive absorption, with in-depth experiments involving 70 participants. Our test results were similar to those of “Bunker de Lumières: van Gogh”, which is an immersive media artwork directed by Gianfranco lannuzzi; the fitness scores of our system and “Bunker de Lumières: van Gogh” were 3.68/5 and 3.81/5, respectively. Moreover, the concordance of implicit senses between artworks and classical music was measured to be 0.88%, and the time distortion and cognitive absorption improved during the immersion. Finally, the proposed method obtained a subjective satisfaction score of 3.53/5 in the evaluation of its usability. Our proposed method can also help spread soundscape-based media art by supporting traditional soundscape design. Furthermore, we hope that our proposed method will help people with visual impairments to appreciate artworks through its application to a multi-modal media art guide platform.
Collapse
|
30
|
Vaškevičius M, Kapočiūtė-Dzikienė J, Šlepikas L. Prediction of Chromatography Conditions for Purification in Organic Synthesis Using Deep Learning. Molecules 2021; 26:2474. [PMID: 33922736 PMCID: PMC8123027 DOI: 10.3390/molecules26092474] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 04/15/2021] [Accepted: 04/22/2021] [Indexed: 01/27/2023] Open
Abstract
In this research, a process for developing normal-phase liquid chromatography solvent systems has been proposed. In contrast to the development of conditions via thin-layer chromatography (TLC), this process is based on the architecture of two hierarchically connected neural network-based components. Using a large database of reaction procedures allows those two components to perform an essential role in the machine-learning-based prediction of chromatographic purification conditions, i.e., solvents and the ratio between solvents. In our paper, we build two datasets and test various molecular vectorization approaches, such as extended-connectivity fingerprints, learned embedding, and auto-encoders along with different types of deep neural networks to demonstrate a novel method for modeling chromatographic solvent systems employing two neural networks in sequence. Afterward, we present our findings and provide insights on the most effective methods for solving prediction tasks. Our approach results in a system of two neural networks with long short-term memory (LSTM)-based auto-encoders, where the first predicts solvent labels (by reaching the classification accuracy of 0.950 ± 0.001) and in the case of two solvents, the second one predicts the ratio between two solvents (R2 metric equal to 0.982 ± 0.001). Our approach can be used as a guidance instrument in laboratories to accelerate scouting for suitable chromatography conditions.
Collapse
Affiliation(s)
- Mantas Vaškevičius
- Department of Applied Informatics, Vytautas Magnus University, LT-44404 Kaunas, Lithuania;
- JSC Synhet, Biržų Str. 6, LT-44139 Kaunas, Lithuania;
| | | | | |
Collapse
|
31
|
MET Exon 14 Skipping: A Case Study for the Detection of Genetic Variants in Cancer Driver Genes by Deep Learning. Int J Mol Sci 2021; 22:ijms22084217. [PMID: 33921709 PMCID: PMC8072630 DOI: 10.3390/ijms22084217] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 04/13/2021] [Accepted: 04/17/2021] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Disruption of alternative splicing (AS) is frequently observed in cancer and might represent an important signature for tumor progression and therapy. Exon skipping (ES) represents one of the most frequent AS events, and in non-small cell lung cancer (NSCLC) MET exon 14 skipping was shown to be targetable. METHODS We constructed neural networks (NN/CNN) specifically designed to detect MET exon 14 skipping events using RNAseq data. Furthermore, for discovery purposes we also developed a sparsely connected autoencoder to identify uncharacterized MET isoforms. RESULTS The neural networks had a Met exon 14 skipping detection rate greater than 94% when tested on a manually curated set of 690 TCGA bronchus and lung samples. When globally applied to 2605 TCGA samples, we observed that the majority of false positives was characterized by a blurry coverage of exon 14, but interestingly they share a common coverage peak in the second intron and we speculate that this event could be the transcription signature of a LINE1 (Long Interspersed Nuclear Element 1)-MET (Mesenchymal Epithelial Transition receptor tyrosine kinase) fusion. CONCLUSIONS Taken together, our results indicate that neural networks can be an effective tool to provide a quick classification of pathological transcription events, and sparsely connected autoencoders could represent the basis for the development of an effective discovery tool.
Collapse
|
32
|
Abstract
Tuberculosis (TB) is an airborne infectious disease caused by organisms in the Mycobacterium tuberculosis (Mtb) complex. In many low and middle-income countries, TB remains a major cause of morbidity and mortality. Once a patient has been diagnosed with TB, it is critical that healthcare workers make the most appropriate treatment decision given the individual conditions of the patient and the likely course of the disease based on medical experience. Depending on the prognosis, delayed or inappropriate treatment can result in unsatisfactory results including the exacerbation of clinical symptoms, poor quality of life, and increased risk of death. This work benchmarks machine learning models to aid TB prognosis using a Brazilian health database of confirmed cases and deaths related to TB in the State of Amazonas. The goal is to predict the probability of death by TB thus aiding the prognosis of TB and associated treatment decision making process. In its original form, the data set comprised 36,228 records and 130 fields but suffered from missing, incomplete, or incorrect data. Following data cleaning and preprocessing, a revised data set was generated comprising 24,015 records and 38 fields, including 22,876 reported cured TB patients and 1139 deaths by TB. To explore how the data imbalance impacts model performance, two controlled experiments were designed using (1) imbalanced and (2) balanced data sets. The best result is achieved by the Gradient Boosting (GB) model using the balanced data set to predict TB-mortality, and the ensemble model composed by the Random Forest (RF), GB and Multi-Layer Perceptron (MLP) models is the best model to predict the cure class.
Collapse
|
33
|
A Deep Learning Method for 3D Object Classification and Retrieval Using the Global Point Signature Plus and Deep Wide Residual Network. SENSORS 2021; 21:s21082644. [PMID: 33918845 PMCID: PMC8070544 DOI: 10.3390/s21082644] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 04/04/2021] [Accepted: 04/05/2021] [Indexed: 12/23/2022]
Abstract
A vital and challenging task in computer vision is 3D Object Classification and Retrieval, with many practical applications such as an intelligent robot, autonomous driving, multimedia contents processing and retrieval, and augmented/mixed reality. Various deep learning methods were introduced for solving classification and retrieval problems of 3D objects. Almost all view-based methods use many views to handle spatial loss, although they perform the best among current techniques such as View-based, Voxelization, and Point Cloud methods. Many views make network structure more complicated due to the parallel Convolutional Neural Network (CNN). We propose a novel method that combines a Global Point Signature Plus with a Deep Wide Residual Network, namely GPSP-DWRN, in this paper. Global Point Signature Plus (GPSPlus) is a novel descriptor because it can capture more shape information of the 3D object for a single view. First, an original 3D model was converted into a colored one by applying GPSPlus. Then, a 32 × 32 × 3 matrix stored the obtained 2D projection of this color 3D model. This matrix was the input data of a Deep Residual Network, which used a single CNN structure. We evaluated the GPSP-DWRN for a retrieval task using the Shapnetcore55 dataset, while using two well-known datasets—ModelNet10 and ModelNet40 for a classification task. Based on our experimental results, our framework performed better than the state-of-the-art methods.
Collapse
|
34
|
Multi-Data Aspects of Protein Similarity with a Learning Technique to Identify Drug-Disease Associations. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11072914] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Drug repositioning has been proposed to develop drugs for diseases. However, the similarity in a single aspect may not be sufficient to reveal hidden information. Therefore, we established protein–protein similarity vectors (PPSVs) based on potential similarities in various types of biological information associated with proteins, including their network topology, proteomic data, functional analysis, and druggable property. Based on the proposed PPSVs, a separate drug–disease matrix was constructed for individual to prevent characteristics from being obscured between diseases. The classification technique was employed for prediction. The results showed that more than half of the tested disease models exhibited high performance, with overall F1 scores of more than 80%. Furthermore, comparing all diseases using traditional methods in one run, we obtained an (area under the curve) AUC of 98.9%. All candidate drugs were then tested in clinical trials (p-value < 2.2 × 10−16) and were known drugs based on their functions (p-value < 0.05). An analysis revealed that, in the functional aspect, the confidence value of an interaction in the protein–protein interaction network and the functional pathway score were the best descriptors for prediction. Based on the learning processes of PPSVs with an isolated disease, the classifier exhibited high performance in predicting and identifying new potential drugs for that disease.
Collapse
|
35
|
Abstract
A novel coronavirus (COVID-19), which has become a great concern for the world, was identified first in Wuhan city in China. The rapid spread throughout the world was accompanied by an alarming number of infected patients and increasing number of deaths gradually. If the number of infected cases can be predicted in advance, it would have a large contribution to controlling this pandemic in any area. Therefore, this study introduces an integrated model for predicting the number of confirmed cases from the perspective of Bangladesh. Moreover, the number of quarantined patients and the change in basic reproduction rate (the R0-value) can also be evaluated using this model. This integrated model combines the SEIR (Susceptible, Exposed, Infected, Removed) epidemiological model and neural networks. The model was trained using available data from 250 days. The accuracy of the prediction of confirmed cases is almost between 90% and 99%. The performance of this integrated model was evaluated by showing the difference in accuracy between the integrated model and the general SEIR model. The result shows that the integrated model is more accurate than the general SEIR model while predicting the number of confirmed cases in Bangladesh.
Collapse
|
36
|
Hybrid Deep Learning Models with Sparse Enhancement Technique for Detection of Newly Grown Tree Leaves. SENSORS 2021; 21:s21062077. [PMID: 33809537 PMCID: PMC8001602 DOI: 10.3390/s21062077] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Revised: 03/04/2021] [Accepted: 03/12/2021] [Indexed: 12/21/2022]
Abstract
The life cycle of leaves, from sprout to senescence, is the phenomenon of regular changes such as budding, branching, leaf spreading, flowering, fruiting, leaf fall, and dormancy due to seasonal climate changes. It is the effect of temperature and moisture in the life cycle on physiological changes, so the detection of newly grown leaves (NGL) is helpful for the estimation of tree growth and even climate change. This study focused on the detection of NGL based on deep learning convolutional neural network (CNN) models with sparse enhancement (SE). As the NGL areas found in forest images have similar sparse characteristics, we used a sparse image to enhance the signal of the NGL. The difference between the NGL and the background could be further improved. We then proposed hybrid CNN models that combined U-net and SegNet features to perform image segmentation. As the NGL in the image were relatively small and tiny targets, in terms of data characteristics, they also belonged to the problem of imbalanced data. Therefore, this paper further proposed 3-Layer SegNet, 3-Layer U-SegNet, 2-Layer U-SegNet, and 2-Layer Conv-U-SegNet architectures to reduce the pooling degree of traditional semantic segmentation models, and used a loss function to increase the weight of the NGL. According to the experimental results, our proposed algorithms were indeed helpful for the image segmentation of NGL and could achieve better kappa results by 0.743.
Collapse
|
37
|
Auliah FN, Nilamyani AN, Shoombuatong W, Alam MA, Hasan MM, Kurata H. PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations. Int J Mol Sci 2021; 22:ijms22042120. [PMID: 33672741 PMCID: PMC7924619 DOI: 10.3390/ijms22042120] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Revised: 02/12/2021] [Accepted: 02/18/2021] [Indexed: 12/30/2022] Open
Abstract
Pupylation is a type of reversible post-translational modification of proteins, which plays a key role in the cellular function of microbial organisms. Several proteomics methods have been developed for the prediction and analysis of pupylated proteins and pupylation sites. However, the traditional experimental methods are laborious and time-consuming. Hence, computational algorithms are highly needed that can predict potential pupylation sites using sequence features. In this research, a new prediction model, PUP-Fuse, has been developed for pupylation site prediction by integrating multiple sequence representations. Meanwhile, we explored the five types of feature encoding approaches and three machine learning (ML) algorithms. In the final model, we integrated the successive ML scores using a linear regression model. The PUP-Fuse achieved a Mathew correlation value of 0.768 by a 10-fold cross-validation test. It also outperformed existing predictors in an independent test. The web server of the PUP-Fuse with curated datasets is freely available.
Collapse
Affiliation(s)
- Firda Nurul Auliah
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
| | - Andi Nur Nilamyani
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand;
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA;
| | - Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
- Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan; (F.N.A.); (A.N.N.); (M.M.H.)
- Correspondence:
| |
Collapse
|
38
|
Bai R, Jiang S, Sun H, Yang Y, Li G. Deep Neural Network-Based Semantic Segmentation of Microvascular Decompression Images. SENSORS 2021; 21:s21041167. [PMID: 33562275 PMCID: PMC7915571 DOI: 10.3390/s21041167] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/31/2020] [Revised: 01/26/2021] [Accepted: 02/02/2021] [Indexed: 11/30/2022]
Abstract
Image semantic segmentation has been applied more and more widely in the fields of satellite remote sensing, medical treatment, intelligent transportation, and virtual reality. However, in the medical field, the study of cerebral vessel and cranial nerve segmentation based on true-color medical images is in urgent need and has good research and development prospects. We have extended the current state-of-the-art semantic-segmentation network DeepLabv3+ and used it as the basic framework. First, the feature distillation block (FDB) was introduced into the encoder structure to refine the extracted features. In addition, the atrous spatial pyramid pooling (ASPP) module was added to the decoder structure to enhance the retention of feature and boundary information. The proposed model was trained by fine tuning and optimizing the relevant parameters. Experimental results show that the encoder structure has better performance in feature refinement processing, improving target boundary segmentation precision, and retaining more feature information. Our method has a segmentation accuracy of 75.73%, which is 3% better than DeepLabv3+.
Collapse
Affiliation(s)
- Ruifeng Bai
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (R.B.); (H.S.); (Y.Y.); (G.L.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Shan Jiang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (R.B.); (H.S.); (Y.Y.); (G.L.)
- Correspondence: ; Tel.: +86-187-4401-2663
| | - Haijiang Sun
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (R.B.); (H.S.); (Y.Y.); (G.L.)
| | - Yifan Yang
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (R.B.); (H.S.); (Y.Y.); (G.L.)
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Guiju Li
- Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences, Changchun 130033, China; (R.B.); (H.S.); (Y.Y.); (G.L.)
| |
Collapse
|
39
|
SSnet: A Deep Learning Approach for Protein-Ligand Interaction Prediction. Int J Mol Sci 2021; 22:ijms22031392. [PMID: 33573266 PMCID: PMC7869013 DOI: 10.3390/ijms22031392] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 01/24/2021] [Accepted: 01/27/2021] [Indexed: 12/15/2022] Open
Abstract
Computational prediction of Protein-Ligand Interaction (PLI) is an important step in the modern drug discovery pipeline as it mitigates the cost, time, and resources required to screen novel therapeutics. Deep Neural Networks (DNN) have recently shown excellent performance in PLI prediction. However, the performance is highly dependent on protein and ligand features utilized for the DNN model. Moreover, in current models, the deciphering of how protein features determine the underlying principles that govern PLI is not trivial. In this work, we developed a DNN framework named SSnet that utilizes secondary structure information of proteins extracted as the curvature and torsion of the protein backbone to predict PLI. We demonstrate the performance of SSnet by comparing against a variety of currently popular machine and non-Machine Learning (ML) models using various metrics. We visualize the intermediate layers of SSnet to show a potential latent space for proteins, in particular to extract structural elements in a protein that the model finds influential for ligand binding, which is one of the key features of SSnet. We observed in our study that SSnet learns information about locations in a protein where a ligand can bind, including binding sites, allosteric sites and cryptic sites, regardless of the conformation used. We further observed that SSnet is not biased to any specific molecular interaction and extracts the protein fold information critical for PLI prediction. Our work forms an important gateway to the general exploration of secondary structure-based Deep Learning (DL), which is not just confined to protein-ligand interactions, and as such will have a large impact on protein research, while being readily accessible for de novo drug designers as a standalone package.
Collapse
|
40
|
Herzog NJ, Magoulas GD. Brain Asymmetry Detection and Machine Learning Classification for Diagnosis of Early Dementia. SENSORS 2021; 21:s21030778. [PMID: 33498908 PMCID: PMC7865614 DOI: 10.3390/s21030778] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2020] [Revised: 01/20/2021] [Accepted: 01/21/2021] [Indexed: 11/30/2022]
Abstract
Early identification of degenerative processes in the human brain is considered essential for providing proper care and treatment. This may involve detecting structural and functional cerebral changes such as changes in the degree of asymmetry between the left and right hemispheres. Changes can be detected by computational algorithms and used for the early diagnosis of dementia and its stages (amnestic early mild cognitive impairment (EMCI), Alzheimer’s Disease (AD)), and can help to monitor the progress of the disease. In this vein, the paper proposes a data processing pipeline that can be implemented on commodity hardware. It uses features of brain asymmetries, extracted from MRI of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database, for the analysis of structural changes, and machine learning classification of the pathology. The experiments provide promising results, distinguishing between subjects with normal cognition (NC) and patients with early or progressive dementia. Supervised machine learning algorithms and convolutional neural networks tested are reaching an accuracy of 92.5% and 75.0% for NC vs. EMCI, and 93.0% and 90.5% for NC vs. AD, respectively. The proposed pipeline offers a promising low-cost alternative for the classification of dementia and can be potentially useful to other brain degenerative disorders that are accompanied by changes in the brain asymmetries.
Collapse
Affiliation(s)
- Nitsa J. Herzog
- Department of Computer Science, Birkbeck College, University of London, London WC1E 7HZ, UK;
| | - George D. Magoulas
- Department of Computer Science, Birkbeck College, University of London, London WC1E 7HZ, UK;
- Birkbeck Knowledge Lab, University of London, London WC1E 7HZ, UK
- Correspondence:
| |
Collapse
|
41
|
Prediction of Protein-ATP Binding Residues Based on Ensemble of Deep Convolutional Neural Networks and LightGBM Algorithm. Int J Mol Sci 2021; 22:ijms22020939. [PMID: 33477866 PMCID: PMC7832895 DOI: 10.3390/ijms22020939] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2020] [Revised: 01/13/2021] [Accepted: 01/16/2021] [Indexed: 12/13/2022] Open
Abstract
Accurately identifying protein-ATP binding residues is important for protein function annotation and drug design. Previous studies have used classic machine-learning algorithms like support vector machine (SVM) and random forest to predict protein-ATP binding residues; however, as new machine-learning techniques are being developed, the prediction performance could be further improved. In this paper, an ensemble predictor that combines deep convolutional neural network and LightGBM with ensemble learning algorithm is proposed. Three subclassifiers have been developed, including a multi-incepResNet-based predictor, a multi-Xception-based predictor, and a LightGBM predictor. The final prediction result is the combination of outputs from three subclassifiers with optimized weight distribution. We examined the performance of our proposed predictor using two datasets: a classic ATP-binding benchmark dataset and a newly proposed ATP-binding dataset. Our predictor achieved area under the curve (AUC) values of 0.925 and 0.902 and Matthews Correlation Coefficient (MCC) values of 0.639 and 0.642, respectively, which are both better than other state-of-art prediction methods.
Collapse
|