1
|
Sun SL, Jiang YY, Yang JP, Xiu YH, Bilal A, Long HX. Predicting noncoding RNA and disease associations using multigraph contrastive learning. Sci Rep 2025; 15:230. [PMID: 39747154 PMCID: PMC11695719 DOI: 10.1038/s41598-024-81862-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Accepted: 11/29/2024] [Indexed: 01/04/2025] Open
Abstract
MiRNAs and lncRNAs are two essential noncoding RNAs. Predicting associations between noncoding RNAs and diseases can significantly improve the accuracy of early diagnosis.With the continuous breakthroughs in artificial intelligence, researchers increasingly use deep learning methods to predict associations. Nevertheless, most existing methods face two major issues: low prediction accuracy and the limitation of only being able to predict a single type of noncoding RNA-disease association. To address these challenges, this paper proposes a method called K-Means and multigraph Contrastive Learning for predicting associations among miRNAs, lncRNAs, and diseases (K-MGCMLD). The K-MGCMLD model is divided into four main steps. The first step is the construction of a heterogeneous graph. The second step involves down sampling using the K-means clustering algorithm to balance the positive and negative samples. The third step is to use an encoder with a Graph Convolutional Network (GCN) architecture to extract embedding vectors. Multigraph contrastive learning, including both local and global graph contrastive learning, is used to help the embedding vectors better capture the latent topological features of the graph. The fourth step involves feature reconstruction using the balanced positive and negative samples and the embedding vectors fed into an XGBoost classifier for multi-association classification prediction. Experimental results have shown that AUC value for miRNA-disease association is 0.9542, lncRNA-disease association is 0.9603, and lncRNA-miRNA association is 0.9687. Additionally, this study has conducted case analyses using K-MGCMLD, which has validated the associations of all the top 30 miRNAs predicted to be associated with lung cancer and Alzheimer's diseases.
Collapse
Affiliation(s)
- Si-Lin Sun
- College of Information Science Technology, Hainan Normal University, Haikou, 571158, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, 571158, China
| | - Yue-Yi Jiang
- College of Information Science Technology, Hainan Normal University, Haikou, 571158, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, 571158, China
| | - Jun-Ping Yang
- College of Information Science Technology, Hainan Normal University, Haikou, 571158, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, 571158, China
| | - Yu-Han Xiu
- College of Information Science Technology, Hainan Normal University, Haikou, 571158, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, 571158, China
| | - Anas Bilal
- College of Information Science Technology, Hainan Normal University, Haikou, 571158, China
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, 571158, China
| | - Hai-Xia Long
- College of Information Science Technology, Hainan Normal University, Haikou, 571158, China.
- Key Laboratory of Data Science and Smart Education, Ministry of Education, Hainan Normal University, Haikou, 571158, China.
| |
Collapse
|
2
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
3
|
Farhangniya M, Samadikuchaksaraei A, Mohamadi Farsani F. Exploring Co-expression Modules-Traits Correlation through Weighted Gene Co-expression Network Analysis: A Promising Approach in Wound Healing Research. Med J Islam Repub Iran 2024; 38:82. [PMID: 39678778 PMCID: PMC11644100 DOI: 10.47176/mjiri.38.82] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Indexed: 12/17/2024] Open
Abstract
Background The skin is the biggest organ in the body and has several important functions in protection and regulation. However, wound development can disrupt the natural healing process, leading to challenges such as chronic wounds, persistent infections, and impaired angiogenesis. These issues not only affect individuals' well-being but also pose significant economic burdens on healthcare systems. Despite advancements in wound care research, managing chronic wounds remains a pressing concern, with obstacles such as persistent infection and impaired angiogenesis hindering the healing process. Understanding the complex genetic pathways involved in wound healing is crucial for developing effective therapeutic strategies and reducing the socio-economic impact of chronic wounds. Weighted Gene Co-Expression Network Analysis (WGCNA) offers a promising approach to uncovering key genes and modules associated with different stages of wound healing, providing valuable insights for targeted interventions to enhance tissue repair and promote efficient wound healing. Methods Data collection involved retrieving microarray gene expression datasets from the Gene Expression Omnibus website, with 65 series selected according to inclusion and exclusion criteria. Preprocessing of raw data was performed using the Robust MultiArray Averaging approach for background correction, normalization, and gene expression calculation. Weighted Gene Co-Expression Network Analysis was employed to identify co-expression patterns among genes associated with wound healing processes. This involved steps such as network construction, topological analysis, module identification, and association with clinical traits. Functional analysis included enrichment analysis and identification of hub genes through gene-gene functional interaction network analysis using the GeneMANIA database. Results The analysis using WGCNA indicated significant correlations between wound healing and the black, brown, and light green modules. These modules were further examined for their relevance to wound healing traits and subjected to functional enrichment analysis. A total of 16 genes were singled out as potential hub genes critical for wound healing. These hub genes were then scrutinized, revealing a gene-gene functional interaction network within the module network based on the KEGG enrichment database. Noteworthy pathways such as MAPK, EGFR, and ErbB signaling pathways, as well as essential cellular processes including autophagy and mitophagy, emerged as the most notable significant pathways. Conclusion We identified consensus modules relating to wound healing across nine microarray datasets. Among these, 16 hub genes were uncovered within the brown and black modules. KEGG enrichment analysis identified co-expression genes within these modules and highlighted pathways most closely associated with the development of wound healing traits, including autophagy and mitophagy. The hub genes identified in this study represent potential candidates for future research endeavors. These findings serve as a stepping stone toward further exploration of the implications of these co-expressed modules on wound healing traits.
Collapse
Affiliation(s)
- Mansoureh Farhangniya
- Cellular and Molecular Research Center, Faculty of Medicine, Iran University of Medical Sciences, Tehran, Iran
- Health Metrics Research Center, Iranian Institute for Health Sciences Research, ACECR, Tehran, Iran
| | - Ali Samadikuchaksaraei
- Department of Medical Biotechnology, Faculty of Allied Medicine, Iran University of Medical Sciences, Tehran, Iran
| | | |
Collapse
|
4
|
Li Y, Du C, Ge S, Zhang R, Shao Y, Chen K, Li Z, Ma F. Hematoma expansion prediction based on SMOTE and XGBoost algorithm. BMC Med Inform Decis Mak 2024; 24:172. [PMID: 38898499 PMCID: PMC11186182 DOI: 10.1186/s12911-024-02561-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Accepted: 05/30/2024] [Indexed: 06/21/2024] Open
Abstract
Hematoma expansion (HE) is a high risky symptom with high rate of occurrence for patients who have undergone spontaneous intracerebral hemorrhage (ICH) after a major accident or illness. Correct prediction of the occurrence of HE in advance is critical to help the doctors to determine the next step medical treatment. Most existing studies focus only on the occurrence of HE within 6 h after the occurrence of ICH, while in reality a considerable number of patients have HE after the first 6 h but within 24 h. In this study, based on the medical doctors recommendation, we focus on prediction of the occurrence of HE within 24 h, as well as the occurrence of HE every 6 h within 24 h. Based on the demographics and computer tomography (CT) image extraction information, we used the XGBoost method to predict the occurrence of HE within 24 h. In this study, to solve the issue of highly imbalanced data set, which is a frequent case in medical data analysis, we used the SMOTE algorithm for data augmentation. To evaluate our method, we used a data set consisting of 582 patients records, and compared the results of proposed method as well as few machine learning methods. Our experiments show that XGBoost achieved the best prediction performance on the balanced dataset processed by the SMOTE algorithm with an accuracy of 0.82 and F1-score of 0.82. Moreover, our proposed method predicts the occurrence of HE within 6, 12, 18 and 24 h at the accuracy of 0.89, 0.82, 0.87 and 0.94, indicating that the HE occurrence within 24 h can be predicted accurately by the proposed method.
Collapse
Affiliation(s)
- Yan Li
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Chaonan Du
- Department of Neurosurgery, Affiliated Jinling Hospital, Medical School of Nanjing University, Nanjing, China
| | - Sikai Ge
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Ruonan Zhang
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Yiming Shao
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Keyu Chen
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Zhepeng Li
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China
| | - Fei Ma
- Department of Mathematics and Physics, Xi'an Jiaotong-Liverpool University, Suzhou, China.
| |
Collapse
|
5
|
Bonomo M, Rombo SE. Neighborhood based computational approaches for the prediction of lncRNA-disease associations. BMC Bioinformatics 2024; 25:187. [PMID: 38741200 PMCID: PMC11089760 DOI: 10.1186/s12859-024-05777-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 04/11/2024] [Indexed: 05/16/2024] Open
Abstract
MOTIVATION Long non-coding RNAs (lncRNAs) are a class of molecules involved in important biological processes. Extensive efforts have been provided to get deeper understanding of disease mechanisms at the lncRNA level, guiding towards the detection of biomarkers for disease diagnosis, treatment, prognosis and prevention. Unfortunately, due to costs and time complexity, the number of possible disease-related lncRNAs verified by traditional biological experiments is very limited. Computational approaches for the prediction of disease-lncRNA associations allow to identify the most promising candidates to be verified in laboratory, reducing costs and time consuming. RESULTS We propose novel approaches for the prediction of lncRNA-disease associations, all sharing the idea of exploring associations among lncRNAs, other intermediate molecules (e.g., miRNAs) and diseases, suitably represented by tripartite graphs. Indeed, while only a few lncRNA-disease associations are still known, plenty of interactions between lncRNAs and other molecules, as well as associations of the latters with diseases, are available. A first approach presented here, NGH, relies on neighborhood analysis performed on a tripartite graph, built upon lncRNAs, miRNAs and diseases. A second approach (CF) relies on collaborative filtering; a third approach (NGH-CF) is obtained boosting NGH by collaborative filtering. The proposed approaches have been validated on both synthetic and real data, and compared against other methods from the literature. It results that neighborhood analysis allows to outperform competitors, and when it is combined with collaborative filtering the prediction accuracy further improves, scoring a value of AUC equal to 0966. AVAILABILITY Source code and sample datasets are available at: https://github.com/marybonomo/LDAsPredictionApproaches.git.
Collapse
Affiliation(s)
| | - Simona E Rombo
- Kazaam Lab s.r.l., Palermo, Italy
- Department of Mathematics and Computer Science, University of Palermo, Palermo, Italy
| |
Collapse
|
6
|
Pelletier SJ, Leclercq M, Roux-Dalvai F, de Geus MB, Leslie S, Wang W, Lam TT, Nairn AC, Arnold SE, Carlyle BC, Precioso F, Droit A. BERNN: Enhancing classification of Liquid Chromatography Mass Spectrometry data with batch effect removal neural networks. Nat Commun 2024; 15:3777. [PMID: 38710683 PMCID: PMC11074280 DOI: 10.1038/s41467-024-48177-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 04/24/2024] [Indexed: 05/08/2024] Open
Abstract
Liquid Chromatography Mass Spectrometry (LC-MS) is a powerful method for profiling complex biological samples. However, batch effects typically arise from differences in sample processing protocols, experimental conditions, and data acquisition techniques, significantly impacting the interpretability of results. Correcting batch effects is crucial for the reproducibility of omics research, but current methods are not optimal for the removal of batch effects without compressing the genuine biological variation under study. We propose a suite of Batch Effect Removal Neural Networks (BERNN) to remove batch effects in large LC-MS experiments, with the goal of maximizing sample classification performance between conditions. More importantly, these models must efficiently generalize in batches not seen during training. A comparison of batch effect correction methods across five diverse datasets demonstrated that BERNN models consistently showed the strongest sample classification performance. However, the model producing the greatest classification improvements did not always perform best in terms of batch effect removal. Finally, we show that the overcorrection of batch effects resulted in the loss of some essential biological variability. These findings highlight the importance of balancing batch effect removal while preserving valuable biological diversity in large-scale LC-MS experiments.
Collapse
Affiliation(s)
- Simon J Pelletier
- Computational Biology Laboratory, CHU de Québec - Université Laval Research Center, Québec City, QC, Canada
| | - Mickaël Leclercq
- Computational Biology Laboratory, CHU de Québec - Université Laval Research Center, Québec City, QC, Canada
| | - Florence Roux-Dalvai
- Computational Biology Laboratory, CHU de Québec - Université Laval Research Center, Québec City, QC, Canada
- Proteomics Platform, CHU de Québec - Université Laval Research Center, Québec City, QC, Canada
| | - Matthijs B de Geus
- Massachusetts General Hospital Department of Neurology, Charlestown, MA, USA
- Leiden University Medical Center, Leiden, The Netherlands
| | - Shannon Leslie
- Yale Department of Psychiatry, New Haven, CT, USA
- Janssen Pharmaceuticals, San Diego, CA, USA
| | - Weiwei Wang
- Keck MS & Proteomics Resource, Yale School of Medicine, New Haven, CT, USA
| | - TuKiet T Lam
- Keck MS & Proteomics Resource, Yale School of Medicine, New Haven, CT, USA
- Yale School of Medicine, Department of Molecular Biophysics and Biochemistry, New Haven, CT, USA
| | | | - Steven E Arnold
- Massachusetts General Hospital Department of Neurology, Charlestown, MA, USA
| | - Becky C Carlyle
- Massachusetts General Hospital Department of Neurology, Charlestown, MA, USA
- Oxford University Department of Physiology Anatomy and Genetics, Oxford, UK
- Kavli Institute for Nanoscience Discovery, Oxford, UK
| | - Frédéric Precioso
- Université Côte d'Azur, CNRS, INRIA, I3S, Sophia Antipolis, Nice, France
| | - Arnaud Droit
- Computational Biology Laboratory, CHU de Québec - Université Laval Research Center, Québec City, QC, Canada.
- Proteomics Platform, CHU de Québec - Université Laval Research Center, Québec City, QC, Canada.
| |
Collapse
|
7
|
Dichio V, De Vico Fallani F. Exploration-Exploitation Paradigm for Networked Biological Systems. PHYSICAL REVIEW LETTERS 2024; 132:098402. [PMID: 38489647 DOI: 10.1103/physrevlett.132.098402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 01/24/2024] [Indexed: 03/17/2024]
Abstract
The stochastic exploration of the configuration space and the exploitation of functional states underlie many biological processes. The evolutionary dynamics stands out as a remarkable example. Here, we introduce a novel formalism that mimics evolution and encodes a general exploration-exploitation dynamics for biological networks. We apply it to the brain wiring problem, focusing on the maturation of that of the nematode Caenorhabditis elegans. We demonstrate that a parsimonious maxent description of the adult brain combined with our framework is able to track down the entire developmental trajectory.
Collapse
Affiliation(s)
- Vito Dichio
- Sorbonne Universite, Paris Brain Institute-ICM, CNRS, Inria, Inserm, AP-HP, Hopital de la Pitie Salpêtriere, F-75013, Paris, France
| | - Fabrizio De Vico Fallani
- Sorbonne Universite, Paris Brain Institute-ICM, CNRS, Inria, Inserm, AP-HP, Hopital de la Pitie Salpêtriere, F-75013, Paris, France
| |
Collapse
|
8
|
Li G, Bai P, Liang C, Luo J. Node-adaptive graph Transformer with structural encoding for accurate and robust lncRNA-disease association prediction. BMC Genomics 2024; 25:73. [PMID: 38233788 PMCID: PMC10795365 DOI: 10.1186/s12864-024-09998-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 01/09/2024] [Indexed: 01/19/2024] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) are integral to a plethora of critical cellular biological processes, including the regulation of gene expression, cell differentiation, and the development of tumors and cancers. Predicting the relationships between lncRNAs and diseases can contribute to a better understanding of the pathogenic mechanisms of disease and provide strong support for the development of advanced treatment methods. RESULTS Therefore, we present an innovative Node-Adaptive Graph Transformer model for predicting unknown LncRNA-Disease Associations, named NAGTLDA. First, we utilize the node-adaptive feature smoothing (NAFS) method to learn the local feature information of nodes and encode the structural information of the fusion similarity network of diseases and lncRNAs using Structural Deep Network Embedding (SDNE). Next, the Transformer module is used to capture potential association information between the network nodes. Finally, we employ a Transformer module with two multi-headed attention layers for learning global-level embedding fusion. Network structure coding is added as the structural inductive bias of the network to compensate for the missing message-passing mechanism in Transformer. NAGTLDA achieved an average AUC of 0.9531 and AUPR of 0.9537 significantly higher than state-of-the-art methods in 5-fold cross validation. We perform case studies on 4 diseases; 55 out of 60 associations between lncRNAs and diseases have been validated in the literatures. The results demonstrate the enormous potential of the graph Transformer structure to incorporate graph structural information for uncovering lncRNA-disease unknown correlations. CONCLUSIONS Our proposed NAGTLDA model can serve as a highly efficient computational method for predicting biological information associations.
Collapse
Affiliation(s)
- Guanghui Li
- School of Information Engineering, East China Jiaotong University, Nanchang, China.
| | - Peihao Bai
- School of Information Engineering, East China Jiaotong University, Nanchang, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
| |
Collapse
|
9
|
Wang H, Zeng W, Huang X, Liu Z, Sun Y, Zhang L. MTTLm 6A: A multi-task transfer learning approach for base-resolution mRNA m 6A site prediction based on an improved transformer. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:272-299. [PMID: 38303423 DOI: 10.3934/mbe.2024013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
N6-methyladenosine (m6A) is a crucial RNA modification involved in various biological activities. Computational methods have been developed for the detection of m6A sites in Saccharomyces cerevisiae at base-resolution due to their cost-effectiveness and efficiency. However, the generalization of these methods has been hindered by limited base-resolution datasets. Additionally, RMBase contains a vast number of low-resolution m6A sites for Saccharomyces cerevisiae, and base-resolution sites are often inferred from these low-resolution results through post-calibration. We propose MTTLm6A, a multi-task transfer learning approach for base-resolution mRNA m6A site prediction based on an improved transformer. First, the RNA sequences are encoded by using one-hot encoding. Then, we construct a multi-task model that combines a convolutional neural network with a multi-head-attention deep framework. This model not only detects low-resolution m6A sites, it also assigns reasonable probabilities to the predicted sites. Finally, we employ transfer learning to predict base-resolution m6A sites based on the low-resolution m6A sites. Experimental results on Saccharomyces cerevisiae m6A and Homo sapiens m1A data demonstrate that MTTLm6A respectively achieved area under the receiver operating characteristic (AUROC) values of 77.13% and 92.9%, outperforming the state-of-the-art models. At the same time, it shows that the model has strong generalization ability. To enhance user convenience, we have made a user-friendly web server for MTTLm6A publicly available at http://47.242.23.141/MTTLm6A/index.php.
Collapse
Affiliation(s)
- Honglei Wang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
- School of Information Engineering, Xuzhou College of Industrial Technology, Xuzhou, China
| | - Wenliang Zeng
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Xiaoling Huang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Zhaoyang Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Yanjing Sun
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| |
Collapse
|
10
|
Yeom HG, Lee BD, Lee W, Lee T, Yun JP. Estimating chronological age through learning local and global features of panoramic radiographs in the Korean population. Sci Rep 2023; 13:21857. [PMID: 38071386 PMCID: PMC10710476 DOI: 10.1038/s41598-023-48960-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 12/01/2023] [Indexed: 12/18/2023] Open
Abstract
This study suggests a hybrid method based on ResNet50 and vision transformer (ViT) in an age estimation model. To this end, panoramic radiographs are used for learning by considering both local features and global information, which is important in estimating age. Transverse and longitudinal panoramic images of 9663 patients were selected (4774 males and 4889 females with a mean age of 39 years and 3 months). To compare ResNet50, ViT, and the hybrid model, the mean absolute error, mean square error, root mean square error, and coefficient of determination (R2) were used as metrics. The results confirmed that the age estimation model designed using the hybrid method performed better than those using only ResNet50 or ViT. The estimation is highly accurate for young people at an age with distinct growth characteristics. When examining the basis for age estimation in the hybrid model through attention rollout, the proposed model used logical and important factors rather than relying on unclear elements as the basis for age estimation.
Collapse
Affiliation(s)
- Han-Gyeol Yeom
- Department of Oral and Maxillofacial Radiology and Wonkwang Dental Research Institute, College of Dentistry, Wonkwang University, Iksan, Republic of Korea
| | - Byung-Do Lee
- Department of Oral and Maxillofacial Radiology and Wonkwang Dental Research Institute, College of Dentistry, Wonkwang University, Iksan, Republic of Korea
| | - Wan Lee
- Department of Oral and Maxillofacial Radiology and Wonkwang Dental Research Institute, College of Dentistry, Wonkwang University, Iksan, Republic of Korea
| | - Taehan Lee
- AI Research Center for Manufacturing Systems (AIMS), Korea Institute of Industrial Technology (KITECH), Daegu, 42994, Republic of Korea.
| | - Jong Pil Yun
- AI Research Center for Manufacturing Systems (AIMS), Korea Institute of Industrial Technology (KITECH), Daegu, 42994, Republic of Korea.
- University of Science and Technology, Daegu, Republic of Korea.
| |
Collapse
|
11
|
Rush KL, Seaton CL, O’Connor BP, Andrade JG, Loewen P, Corman K, Burton L, Smith MA, Moroz L. Managing With Atrial Fibrillation: An Exploratory Model-Based Cluster Analysis of Clinical and Personal Patient Characteristics. CJC Open 2023; 5:833-845. [PMID: 38020332 PMCID: PMC10679453 DOI: 10.1016/j.cjco.2023.08.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 08/16/2023] [Indexed: 12/01/2023] Open
Abstract
Background Examining characteristics of patients with atrial fibrillation (AF) has the potential to help in identifying groups of patients who might benefit from different management approaches. Methods Secondary analysis of online survey data was combined with clinic referral data abstraction from 196 patients with AF attending an AF specialty clinic. Cluster analyses were performed to identify distinct, homogeneous clusters of AF patients defined by 11 relevant variables: CHA2DS2-VASc score, age, AF symptoms, overall health, mental health, AF knowledge, perceived stress, household and recreation activity, overall AF quality of life, and AF symptom treatment satisfaction. Follow-up analyses examined differences between the cluster groups in additional clinical variables. Results Evidence emerged for both 2- and 4-cluster solutions. The 2-cluster solution involved a contrast between patients who were doing well on all variables (n = 129; 66%) vs those doing less well (n = 67; 34%). The 4-cluster solution provided a closer-up view of the data, showing that the group doing less well was split into 3 meaningfully different subgroups of patients who were managing in different ways. The final 4 clusters produced were as follows: (i) doing well; (ii) stressed and discontented; (iii) struggling and dissatisfied; and (iv) satisfied and complacent. Conclusions Patients with AF can be accurately classified into distinct, natural groupings that vary in clinically important ways. Among the patients who were not managing well with AF, we found 3 distinct subgroups of patients who may benefit from tailored approaches to AF management and support. The tailoring of treatment approaches to specific personal and/or behavioural patterns, alongside clinical patterns, holds potential to improve patient outcomes (eg, treatment satisfaction).
Collapse
Affiliation(s)
- Kathy L. Rush
- School of Nursing, University of British Columbia—Okanagan, Kelowna, British Columbia, Canada
| | - Cherisse L. Seaton
- School of Nursing, University of British Columbia—Okanagan, Kelowna, British Columbia, Canada
| | - Brian P. O’Connor
- Department of Psychology, University of British Columbia—Okanagan, Kelowna, British Columbia, Canada
| | - Jason G. Andrade
- Cardiac Atrial Fibrillation Specialty Clinic, Vancouver General Hospital, Vancouver, British Columbia, Canada
- Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Peter Loewen
- Faculty of Pharmaceutical Sciences, University of British Columbia—Vancouver, Vancouver, British Columbia, Canada
| | - Kendra Corman
- School of Nursing, University of British Columbia—Okanagan, Kelowna, British Columbia, Canada
| | - Lindsay Burton
- School of Nursing, University of British Columbia—Okanagan, Kelowna, British Columbia, Canada
| | - Mindy A. Smith
- Department of Family Medicine, Michigan State University, East Lansing, Michigan, USA
| | - Lana Moroz
- Cardiac Atrial Fibrillation Specialty Clinic, Vancouver General Hospital, Vancouver, British Columbia, Canada
| |
Collapse
|
12
|
Logan R, Wehe AW, Woods DC, Tilly J, Khrapko K. Interpreting Sequence-Levenshtein distance for determining error type and frequency between two embedded sequences of equal length. ARXIV 2023:arXiv:2310.12833v1. [PMID: 37904736 PMCID: PMC10614987] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/01/2023]
Abstract
Levenshtein distance is a commonly used edit distance metric, typically applied in language processing, and to a lesser extent, in molecular biology analysis. Biological nucleic acid sequences are often embedded in longer sequences and are subject to insertion and deletion errors that introduce frameshift during sequencing. These frameshift errors are due to string context and should not be counted as true biological errors. Sequence-Levenshtein distance is a modification to Levenshtein distance that is permissive of frameshift error without additional penalty. However, in a biological context Levenshtein distance needs to accommodate both frameshift and weighted errors, which Sequence-Levenshtein distance cannot do. Errors are weighted when they are associated with a numerical cost that corresponds to their frequency of appearance. Here, we describe a modification that allows the use of Levenshtein distance and Sequence-Levenshtein distance to appropriately accommodate penalty-free frameshift between embedded sequences and correctly weight specific error types.
Collapse
Affiliation(s)
- Robert Logan
- Science and Technology Division, Biology and Bioinformatics Department, Eastern Nazarene College, Quincy, MA 02170
| | - Amy Wangsness Wehe
- Health and Natural Sciences Division, Mathematics Department, Fitchburg State University, Fitch-burg, MA 01420-2697
| | - Dori C Woods
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA 02115
| | - Jon Tilly
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA 02115
| | - Konstantin Khrapko
- College of Science, Department of Biology, Northeastern University, 330 Huntington Ave, Boston, MA 02115
| |
Collapse
|
13
|
Gimpel AL, Stark WJ, Heckel R, Grass RN. A digital twin for DNA data storage based on comprehensive quantification of errors and biases. Nat Commun 2023; 14:6026. [PMID: 37758710 PMCID: PMC10533828 DOI: 10.1038/s41467-023-41729-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Accepted: 09/18/2023] [Indexed: 09/29/2023] Open
Abstract
Archiving data in synthetic DNA offers unprecedented storage density and longevity. Handling and storage introduce errors and biases into DNA-based storage systems, necessitating the use of Error Correction Coding (ECC) which comes at the cost of added redundancy. However, insufficient data on these errors and biases, as well as a lack of modeling tools, limit data-driven ECC development and experimental design. In this study, we present a comprehensive characterisation of the error sources and biases present in the most common DNA data storage workflows, including commercial DNA synthesis, PCR, decay by accelerated aging, and sequencing-by-synthesis. Using the data from 40 sequencing experiments, we build a digital twin of the DNA data storage process, capable of simulating state-of-the-art workflows and reproducing their experimental results. We showcase the digital twin's ability to replace experiments and rationalize the design of redundancy in two case studies, highlighting opportunities for tangible cost savings and data-driven ECC development.
Collapse
Affiliation(s)
- Andreas L Gimpel
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland
| | - Wendelin J Stark
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland
| | - Reinhard Heckel
- Department of Computer Engineering, Technical University of Munich, Arcistrasse 21, 80333, Munich, Germany
| | - Robert N Grass
- Department of Chemistry and Applied Biosciences, ETH Zürich, Vladimir-Prelog-Weg 1-5, 8093, Zürich, Switzerland.
| |
Collapse
|
14
|
Pan S, Xia L, Xu L, Li Z. SubMDTA: drug target affinity prediction based on substructure extraction and multi-scale features. BMC Bioinformatics 2023; 24:334. [PMID: 37679724 PMCID: PMC10485962 DOI: 10.1186/s12859-023-05460-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Accepted: 08/31/2023] [Indexed: 09/09/2023] Open
Abstract
BACKGROUND Drug-target affinity (DTA) prediction is a critical step in the field of drug discovery. In recent years, deep learning-based methods have emerged for DTA prediction. In order to solve the problem of fusion of substructure information of drug molecular graphs and utilize multi-scale information of protein, a self-supervised pre-training model based on substructure extraction and multi-scale features is proposed in this paper. RESULTS For drug molecules, the model obtains substructure information through the method of probability matrix, and the contrastive learning method is implemented on the graph-level representation and subgraph-level representation to pre-train the graph encoder for downstream tasks. For targets, a BiLSTM method that integrates multi-scale features is used to capture long-distance relationships in the amino acid sequence. The experimental results showed that our model achieved better performance for DTA prediction. CONCLUSIONS The proposed model improves the performance of the DTA prediction, which provides a novel strategy based on substructure extraction and multi-scale features.
Collapse
Affiliation(s)
- Shourun Pan
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Leiming Xia
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Lei Xu
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Zhen Li
- College of Computer Science and Technology, Qingdao University, Qingdao, China.
| |
Collapse
|
15
|
Wang Z, Wang H, Zhao J, Zheng C. scSemiAAE: a semi-supervised clustering model for single-cell RNA-seq data. BMC Bioinformatics 2023; 24:217. [PMID: 37237310 DOI: 10.1186/s12859-023-05339-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Accepted: 05/16/2023] [Indexed: 05/28/2023] Open
Abstract
BACKGROUND Single-cell RNA sequencing (scRNA-seq) strives to capture cellular diversity with higher resolution than bulk RNA sequencing. Clustering analysis is critical to transcriptome research as it allows for further identification and discovery of new cell types. Unsupervised clustering cannot integrate prior knowledge where relevant information is widely available. Purely unsupervised clustering algorithms may not yield biologically interpretable clusters when confronted with the high dimensionality of scRNA-seq data and frequent dropout events, which makes identification of cell types more challenging. RESULTS We propose scSemiAAE, a semi-supervised clustering model for scRNA sequence analysis using deep generative neural networks. Specifically, scSemiAAE carefully designs a ZINB adversarial autoencoder-based architecture that inherently integrates adversarial training and semi-supervised modules in the latent space. In a series of experiments on scRNA-seq datasets spanning thousands to tens of thousands of cells, scSemiAAE can significantly improve clustering performance compared to dozens of unsupervised and semi-supervised algorithms, promoting clustering and interpretability of downstream analyses. CONCLUSION scSemiAAE is a Python-based algorithm implemented on the VSCode platform that provides efficient visualization, clustering, and cell type assignment for scRNA-seq data. The tool is available from https://github.com/WHang98/scSemiAAE .
Collapse
Affiliation(s)
- Zile Wang
- School of Mathematics and System Science, Xinjiang University, Urumqi, China
| | - Haiyun Wang
- School of Mathematics and System Science, Xinjiang University, Urumqi, China
| | - Jianping Zhao
- School of Mathematics and System Science, Xinjiang University, Urumqi, China.
| | - Chunhou Zheng
- School of Mathematics and System Science, Xinjiang University, Urumqi, China.
- School of Computer Science and Technology, Anhui University, Hefei, China.
| |
Collapse
|
16
|
Li P, Chen J, Chen Y, Song S, Huang X, Yang Y, Li Y, Tong Y, Xie Y, Li J, Li S, Wang J, Qian K, Wang C, Du L. Construction of Exosome SORL1 Detection Platform Based on 3D Porous Microfluidic Chip and its Application in Early Diagnosis of Colorectal Cancer. SMALL (WEINHEIM AN DER BERGSTRASSE, GERMANY) 2023; 19:e2207381. [PMID: 36799198 DOI: 10.1002/smll.202207381] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 01/29/2023] [Indexed: 05/18/2023]
Abstract
Exosomes are promising new biomarkers for colorectal cancer (CRC) diagnosis, due to their rich biological fingerprints and high level of stability. However, the accurate detection of exosomes with specific surface receptors is limited to clinical application. Herein, an exosome enrichment platform on a 3D porous sponge microfluidic chip is constructed and the exosome capture efficiency of this chip is ≈90%. Also, deep mass spectrometry analysis followed by multi-level expression screenings revealed a CRC-specific exosome membrane protein (SORL1). A method of SORL1 detection by specific quantum dot labeling is further designed and the ensemble classification system is established by extracting features from 64-patched fluorescence images. Importantly, the area under the curve (AUC) using this system is 0.99, which is significantly higher (p < 0.001) than that using a conventional biomarker (carcinoembryonic antigen (CEA), AUC of 0.71). The above system showed similar diagnostic performance, dealing with early-stage CRC, young CRC, and CEA-negative CRC patients.
Collapse
Affiliation(s)
- Peilong Li
- Department of Clinical Laboratory, The Second Hospital of Shandong University, Jinan, 250033, China
| | - Jiaci Chen
- State Key Laboratory of Biobased Material and Green Papermaking, Department of Bioengineering, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250300, China
| | - Yuqing Chen
- Department of Clinical Laboratory, The Second Hospital of Shandong University, Jinan, 250033, China
| | - Shangling Song
- Department of medical engineering equipment, The Second Hospital of Shandong University, Jinan, 250033, China
| | - Xiaowen Huang
- State Key Laboratory of Biobased Material and Green Papermaking, Department of Bioengineering, Qilu University of Technology (Shandong Academy of Sciences), Jinan, 250300, China
| | - Yang Yang
- School of Information Science and Engineering, Shandong University, Jinan, 250000, China
| | - Yanru Li
- Department of Clinical Laboratory, The Second Hospital of Shandong University, Jinan, 250033, China
| | - Yao Tong
- Department of Clinical Laboratory, The Second Hospital of Shandong University, Jinan, 250033, China
| | - Yan Xie
- Department of Clinical Laboratory, The Second Hospital of Shandong University, Jinan, 250033, China
| | - Juan Li
- Department of Clinical Laboratory, The Second Hospital of Shandong University, Jinan, 250033, China
| | - Shunxiang Li
- State Key Laboratory for Oncogenes and Related Genes, School of Biomedical Engineering, Institute of Medical Robotics and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, 200030, China
- Department of Obstetrics and Gynecology, Department of Cardiology, Shanghai Key Laboratory of Gynecologic Oncology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Jiayi Wang
- Country Department of Clinical Laboratory Medicine, Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, 200030, China
| | - Kun Qian
- State Key Laboratory for Oncogenes and Related Genes, School of Biomedical Engineering, Institute of Medical Robotics and Med-X Research Institute, Shanghai Jiao Tong University, Shanghai, 200030, China
- Department of Obstetrics and Gynecology, Department of Cardiology, Shanghai Key Laboratory of Gynecologic Oncology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, 200127, China
| | - Chuanxin Wang
- Department of Clinical Laboratory, The Second Hospital of Shandong University, Jinan, 250033, China
| | - Lutao Du
- Department of Clinical Laboratory, The Second Hospital of Shandong University, Jinan, 250033, China
| |
Collapse
|
17
|
Zhang L, Lu D, Bi X, Zhao K, Yu G, Quan N. Predicting disease genes based on multi-head attention fusion. BMC Bioinformatics 2023; 24:162. [PMID: 37085750 PMCID: PMC10122338 DOI: 10.1186/s12859-023-05285-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 04/12/2023] [Indexed: 04/23/2023] Open
Abstract
BACKGROUND The identification of disease-related genes is of great significance for the diagnosis and treatment of human disease. Most studies have focused on developing efficient and accurate computational methods to predict disease-causing genes. Due to the sparsity and complexity of biomedical data, it is still a challenge to develop an effective multi-feature fusion model to identify disease genes. RESULTS This paper proposes an approach to predict the pathogenic gene based on multi-head attention fusion (MHAGP). Firstly, the heterogeneous biological information networks of disease genes are constructed by integrating multiple biomedical knowledge databases. Secondly, two graph representation learning algorithms are used to capture the feature vectors of gene-disease pairs from the network, and the features are fused by introducing multi-head attention. Finally, multi-layer perceptron model is used to predict the gene-disease association. CONCLUSIONS The MHAGP model outperforms all of other methods in comparative experiments. Case studies also show that MHAGP is able to predict genes potentially associated with diseases. In the future, more biological entity association data, such as gene-drug, disease phenotype-gene ontology and so on, can be added to expand the information in heterogeneous biological networks and achieve more accurate predictions. In addition, MHAGP with strong expansibility can be used for potential tasks such as gene-drug association and drug-disease association prediction.
Collapse
Affiliation(s)
- Linlin Zhang
- College of Software Engineering, Xinjiang University, Urumqi, China.
| | - Dianrong Lu
- College of information Science and Engineering, Xinjiang University, Urumqi, China
| | - Xuehua Bi
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi, China
| | - Kai Zhao
- College of information Science and Engineering, Xinjiang University, Urumqi, China
| | - Guanglei Yu
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi, China
| | - Na Quan
- College of information Science and Engineering, Xinjiang University, Urumqi, China
| |
Collapse
|
18
|
Lin CL, Wu KC. Development of revised ResNet-50 for diabetic retinopathy detection. BMC Bioinformatics 2023; 24:157. [PMID: 37076790 PMCID: PMC10114328 DOI: 10.1186/s12859-023-05293-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 04/15/2023] [Indexed: 04/21/2023] Open
Abstract
BACKGROUND Diabetic retinopathy (DR) produces bleeding, exudation, and new blood vessel formation conditions. DR can damage the retinal blood vessels and cause vision loss or even blindness. If DR is detected early, ophthalmologists can use lasers to create tiny burns around the retinal tears to inhibit bleeding and prevent the formation of new blood vessels, in order to prevent deterioration of the disease. The rapid improvement of deep learning has made image recognition an effective technology; it can avoid misjudgments caused by different doctors' evaluations and help doctors to predict the condition quickly. The aim of this paper is to adopt visualization and preprocessing in the ResNet-50 model to improve module calibration, to enable the model to predict DR accurately. RESULTS This study compared the performance of the proposed method with other common CNNs models (Xception, AlexNet, VggNet-s, VggNet-16 and ResNet-50). In examining said models, the results alluded to an over-fitting phenomenon, and the outcome of the work demonstrates that the performance of the revised ResNet-50 (Train accuracy: 0.8395 and Test accuracy: 0.7432) is better than other common CNNs (that is, the revised structure of ResNet-50 could avoid the overfitting problem, decease the loss value, and reduce the fluctuation problem). CONCLUSIONS This study proposed two approaches to designing the DR grading system: a standard operation procedure (SOP) for preprocessing the fundus image, and a revised structure of ResNet-50, including an adaptive learning rating to adjust the weight of layers, regularization and change the structure of ResNet-50, which was selected for its suitable features. It is worth noting that the purpose of this study was not to design the most accurate DR screening network, but to demonstrate the effect of the SOP of DR and the visualization of the revised ResNet-50 model. The results provided an insight to revise the structure of CNNs using the visualization tool.
Collapse
Affiliation(s)
- Chun-Ling Lin
- Department of Electrical Engineering, Ming Chi University of Technology, No. 84, Gongzhuan Rd., Taishan Dist., New Taipei City, 243, Taiwan.
| | - Kun-Chi Wu
- Department of Electrical Engineering, Ming Chi University of Technology, No. 84, Gongzhuan Rd., Taishan Dist., New Taipei City, 243, Taiwan
| |
Collapse
|
19
|
Sreedharan V, Bhalla US, Ramakrishnan N. Using sensitivity analyses to understand bistable system behavior. BMC Bioinformatics 2023; 24:136. [PMID: 37024783 PMCID: PMC10080961 DOI: 10.1186/s12859-023-05206-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2022] [Accepted: 02/24/2023] [Indexed: 04/08/2023] Open
Abstract
BACKGROUND Bistable systems, i.e., systems that exhibit two stable steady states, are of particular interest in biology. They can implement binary cellular decision making, e.g., in pathways for cellular differentiation and cell cycle regulation. The onset of cancer, prion diseases, and neurodegenerative diseases are known to be associated with malfunctioning bistable systems. Exploring and characterizing parameter spaces in bistable systems, so that they retain or lose bistability, is part of a lot of therapeutic research such as cancer pharmacology. RESULTS We use eigenvalue sensitivity analysis and stable state separation sensitivity analysis to understand bistable system behaviors, and to characterize the most sensitive parameters of a bistable system. While eigenvalue sensitivity analysis is an established technique in engineering disciplines, it has not been frequently used to study biological systems. We demonstrate the utility of these approaches on a published bistable system. We also illustrate scalability and generalizability of these methods to larger bistable systems. CONCLUSIONS Eigenvalue sensitivity analysis and separation sensitivity analysis prove to be promising tools to define parameter design rules to make switching decisions between either stable steady state of a bistable system and a corresponding monostable state after bifurcation. These rules were applied to the smallest two-component bistable system and results were validated analytically. We showed that with multiple parameter settings of the same bistable system, we can design switching to a desirable state to retain or lose bistability when the most sensitive parameter is varied according to our parameter perturbation recommendations. We propose eigenvalue and stable state separation sensitivity analyses as a framework to evaluate large and complex bistable systems.
Collapse
Affiliation(s)
- Vandana Sreedharan
- Genetics, Bioinformatics, and Computational Biology, Virginia Tech, Blacksburg, VA, 24061, USA.
| | - Upinder S Bhalla
- National Centre for Biological Sciences, TIFR, Bellary Road, Bangalore, 560065, India
| | - Naren Ramakrishnan
- Department of Computer Science, Virginia Tech, Arlington, VA, 22203, USA
| |
Collapse
|
20
|
Singh S, Choudhury A, Hazelhurst S, Crowther N, Boua P, Sorgho H, Agongo G, Nonterah E, Micklesfield L, Norris S, Kisiangani I, Mohamed S, Gomez-Olive F, Tollman S, Choma S, Brandenburg JT, Ramsay M. Genome-wide Association Study Meta-analysis of Blood Pressure Traits and Hypertension in Sub-Saharan African Populations: An AWI-Gen Study. RESEARCH SQUARE 2023:rs.3.rs-2532794. [PMID: 36824767 PMCID: PMC9949264 DOI: 10.21203/rs.3.rs-2532794/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
Abstract
Most hypertension-related genome-wide association studies (GWAS) focus on non-African populations, despite hypertension (a major risk factor for cardiovascular disease) being highly prevalent in Africa. The AWI-Gen study GWAS meta-analysis for blood pressure-related traits (systolic and diastolic blood pressure, pulse pressure, mean-arterial pressure and hypertension) from three sub-Saharan African geographic regions (N=10,775), identified two genome-wide significant signals (p<5E-08): systolic blood pressure near P2RY1 (rs77846204; intergenic variant, p=4.25E-08) and pulse pressure near Linc01256 (rs80141533; intergenic variant, p=4.25E-08). No genome-wide signals were detected for the AWI-Gen GWAS meta-analysis with previous African-ancestry GWASs (UK Biobank (African), Uganda Genome Resource). Suggestive signals (p<5E-06) were observed for all traits, with 29 displaying pleiotropic effects and several replicating known associations. Polygenic risk scores developed from studies on different ancestries had limited transferability, with multi-ancestry models providing better prediction. This study provides insights into the genetics and physiology of blood pressure variation in African populations.
Collapse
Affiliation(s)
- Surina Singh
- Sydney Brenner Institute for Molecular Bioscience (SBIMB), University of the Witwatersrand
| | | | - Scott Hazelhurst
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences & School of Electrical & Information Engineering, University of the Witwatersrand
| | - Nigel Crowther
- 11Department of Chemical Pathology, National Health Laboratory Service
| | - Palwende Boua
- Clinical Research Unit of Nanoro, Institut de Recherche en Sciences de la Santé
| | - Hermann Sorgho
- Clinical Research Unit of Nanoro, Institut de Recherche en Sciences de la Santé
| | | | | | | | - Shane Norris
- SAMRC Developmental Pathways For Health Research Unit, Department of Paediatrics & Child Health, University of the Witwatersrand, Johannesburg, South Africa
| | | | | | - Francesc Gomez-Olive
- 8MRC/Wits Rural Public Health and Health Transitions Research Unit (Agincourt), School of Public Health, Faculty of Health Sciences, University of the Witwatersrand
| | | | | | | | | |
Collapse
|
21
|
Welsh C, Xu J, Smith L, König M, Choi K, Sauro HM. libRoadRunner 2.0: a high performance SBML simulation and analysis library. Bioinformatics 2023; 39:6883908. [PMID: 36478036 PMCID: PMC9825722 DOI: 10.1093/bioinformatics/btac770] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 11/22/2022] [Accepted: 12/07/2022] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION This article presents libRoadRunner 2.0, an extensible, high-performance, cross-platform, open-source software library for the simulation and analysis of models expressed using the systems biology markup language (SBML). RESULTS libRoadRunner is a self-contained library, able to run either as a component inside other tools via its C++, C and Python APIs, or interactively through its Python or Julia interface. libRoadRunner uses a custom just-in-time (JIT) compiler built on the widely used LLVM JIT compiler framework. It compiles SBML-specified models directly into native machine code for a large variety of processors, making it fast enough to simulate extremely large models or repeated runs in reasonable timeframes. libRoadRunner is flexible, supporting the bulk of the SBML specification (except for delay and non-linear algebraic equations) as well as several SBML extensions such as hierarchical composition and probability distributions. It offers multiple deterministic and stochastic integrators, as well as tools for steady-state, sensitivity, stability and structural analyses. AVAILABILITY AND IMPLEMENTATION libRoadRunner binary distributions for Windows, Mac OS and Linux, Julia and Python bindings, source code and documentation are all available at https://github.com/sys-bio/roadrunner, and Python bindings are also available via pip. The source code can be compiled for the supported systems as well as in principle any system supported by LLVM-13, such as ARM-based computers like the Raspberry Pi. The library is licensed under the Apache License Version 2.0.
Collapse
Affiliation(s)
- Ciaran Welsh
- Department of Bioengineering, University of Washington, Seattle, WA 98195, USA
| | - Jin Xu
- Department of Bioengineering, University of Washington, Seattle, WA 98195, USA
| | - Lucian Smith
- Department of Bioengineering, University of Washington, Seattle, WA 98195, USA
| | - Matthias König
- Institute of Biology, Institute of Theoretical Biology, Humboldt-University Berlin, Berlin 10115, Germany
| | - Kiri Choi
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Republic of Korea
| | | |
Collapse
|
22
|
Zahra N, Zeshan B, Ishaq M. Carbapenem resistance gene crisis in A. baumannii: a computational analysis. BMC Microbiol 2022; 22:290. [PMID: 36463105 PMCID: PMC9719202 DOI: 10.1186/s12866-022-02706-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Accepted: 11/14/2022] [Indexed: 12/05/2022] Open
Abstract
Acinetobacter baumannii (A. baumannii) is one of the members of ESKAPE bacteria which is considered multidrug resistant globally. The objective of this study is to determine the protein docking of different antibiotic resistance gene (ARGs) in A. baumannii. In silico analysis of antibiotic resistance genes against carbapenem are the blaOXA-51, blaOXA-23, blaOXA-58, blaOXA-24, blaOXA-143, NMD-1 and IMP-1 in A. baumannii. The doripenem, imipenem and meropenem were docked to blaOXA-51 and blaOXA-23 using PyRx. The top docking energy was -5.5 kcal/mol by imipenem and doripenem and meropenem showed a binding score of -5. 2 kcal/mol each and blaOXA-23 energy was -4.3 kcal/mol by imipenem and meropenem showed a binding score of -2.3 kcal/mol, while doripenem showed the binding score of -3.4 kcal/mol. Similarly, doripenem imipenem and meropenem were docked to blaOXA-58, IMP-1, Rec A and blaOXA-143, with docking energy was -8.8 kcal/mol by doripenem and meropenem each while imipenem showed a binding score of -4.2 kcal/mol and with IMP-1 demonstrated their binding energies. was -5.7 kcal/mol by meropenem and doripenem showed a binding score of -5.3 kcal/mol, while imipenem showed a binding score of -4.5 kcal/mol. And docking energy was -4.9 kcal/mol by imipenem and meropenem showed binding energy of -3.6 kcal/mol each while doripenem showed a binding score of -3.9 kcal/mol in RecA and with blaOXA-143 docking energy was -3.0 kcal/mol by imipenem and meropenem showed a binding score of -1.9 kcal/mol, while doripenem showed the binding score of -2.5 kcal/mol respectively. Doripenem, imipenem, and meropenem docking findings with blaOXA-24 confirmed their binding energies. Doripenem had the highest docking energy of -5.5 kcal/mol, meropenem had a binding score of -4.0 kcal/mol, and imipenem had a binding score of -3.9 kcal/mol. PyRx was used to dock the doripenem, imipenem, and meropenem to NMD-1. Docking energies for doripenem were all - 4.0 kcal/mol, whereas meropenem had docking energy of -3.3 kcal/mol and imipenem was -1.50 kcal/mol. To the best of our knowledge the underlying mechanism of phenotypic with genotypic resistance molecular docking regarding carbapenem resistance A. baumannii is unclear. Our molecular docking finds the possible protein targeting mechanism for carbapenem-resistant A.baumannii.
Collapse
Affiliation(s)
- Nureen Zahra
- grid.444936.80000 0004 0608 9608Department of Microbiology, Faculty of Science & Technology, University of Central Punjab, Lahore, Pakistan ,grid.440564.70000 0001 0415 4232Institute of Molecular Biology and Biotechnology, The University of Lahore, Lahore, Pakistan
| | - Basit Zeshan
- grid.444936.80000 0004 0608 9608Department of Microbiology, Faculty of Science & Technology, University of Central Punjab, Lahore, Pakistan ,grid.265727.30000 0001 0417 0814Faculty of Sustainable Agriculture, Universiti Malaysia Sabah, 90905 Sandakan, Sabah Malaysia
| | - Musarat Ishaq
- grid.1073.50000 0004 0626 201XLymphatics and Regenerative Surgery Laboratory, Obrien Institute and St Vincent’s Institute, Fitzroy, Australia
| |
Collapse
|
23
|
Askr H, Elgeldawi E, Aboul Ella H, Elshaier YAMM, Gomaa MM, Hassanien AE. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev 2022; 56:5975-6037. [PMID: 36415536 PMCID: PMC9669545 DOI: 10.1007/s10462-022-10306-1] [Citation(s) in RCA: 85] [Impact Index Per Article: 28.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/18/2022]
Abstract
Recently, using artificial intelligence (AI) in drug discovery has received much attention since it significantly shortens the time and cost of developing new drugs. Deep learning (DL)-based approaches are increasingly being used in all stages of drug development as DL technology advances, and drug-related data grows. Therefore, this paper presents a systematic Literature review (SLR) that integrates the recent DL technologies and applications in drug discovery Including, drug-target interactions (DTIs), drug-drug similarity interactions (DDIs), drug sensitivity and responsiveness, and drug-side effect predictions. We present a review of more than 300 articles between 2000 and 2022. The benchmark data sets, the databases, and the evaluation measures are also presented. In addition, this paper provides an overview of how explainable AI (XAI) supports drug discovery problems. The drug dosing optimization and success stories are discussed as well. Finally, digital twining (DT) and open issues are suggested as future research challenges for drug discovery problems. Challenges to be addressed, future research directions are identified, and an extensive bibliography is also included.
Collapse
Affiliation(s)
- Heba Askr
- Faculty of Computers and Artificial Intelligence, University of Sadat City, Sadat City, Egypt
| | - Enas Elgeldawi
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Heba Aboul Ella
- Faculty of Pharmacy and Drug Technology, Chinese University in Egypt (CUE), Cairo, Egypt
| | | | - Mamdouh M. Gomaa
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Aboul Ella Hassanien
- Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt
| |
Collapse
|