1
|
Yang Z, Wu Y, Liu H, He L, Deng X. AMYGNN: A Graph Convolutional Neural Network-Based Approach for Predicting Amyloid Formation from Polypeptides. J Chem Inf Model 2024; 64:1751-1762. [PMID: 38408296 DOI: 10.1021/acs.jcim.3c02035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/28/2024]
Abstract
There has been an increasing interest in the use of amyloids for constructing various functional materials. The design of amyloid-associated functional materials requires the identification of the core peptide sequences as the fundamental building block. The existing computational methods are limited in terms of delineating polypeptides, the typical non-Euclidean structural data, and they fail to capture the dynamic interactions between amino acids due to ignoring the contextual information from surrounding amino acids. Here, we first propose the use of a state-of-the-art graph convolutional neural network for predicting the trends of amyloid formation from specific peptide sequences (AMYGNN) by abstracting each polypeptide as a graph, in which the constituting amino acids are viewed as nodes and edges characterizing the connections between pairs of amino acids are established when they meet a given distance threshold (Cα-Cα ≤ 5 Å). Our model achieves high performance with accuracy (0.9208), G-mean (0.9203), MCC (0.8417), and F1 (0.9235) in determining the characteristic peptide sequences to form amyloid. 32 of 534 crucial amino acid properties that greatly contribute to the formation of amyloids are ascertained, and the β-folding-like graph structure of a polypeptide is believed to be essential for the formation of amyloid. Our model enables the mapping of polypeptides with underlying interactions between amino acids and provides a quick and precise predictive framework for directing the construction of amyloid-associated functional materials.
Collapse
Affiliation(s)
- Zuojun Yang
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| | - Yuhan Wu
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| | - Hao Liu
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| | - Li He
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| | - Xiaoyuan Deng
- MOE Key Laboratory of Laser Life Science & Institute of Laser Life Science, College of Biophotonics, South China Normal University, Guangzhou 510631, China
- Guangdong Provincial Key Laboratory of Laser Life Science, and Guangzhou Key Laboratory of Spectral Analysis and Functional Probes, College of Biophotonics, South China Normal University, Guangzhou 510631, China
| |
Collapse
|
2
|
Fan X, Pan H, Tian A, Chung WK, Shen Y. SHINE: protein language model-based pathogenicity prediction for short inframe insertion and deletion variants. Brief Bioinform 2023; 24:bbac584. [PMID: 36575831 PMCID: PMC9851320 DOI: 10.1093/bib/bbac584] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Revised: 11/04/2022] [Accepted: 11/29/2022] [Indexed: 12/29/2022] Open
Abstract
Accurate variant pathogenicity predictions are important in genetic studies of human diseases. Inframe insertion and deletion variants (indels) alter protein sequence and length, but not as deleterious as frameshift indels. Inframe indel Interpretation is challenging due to limitations in the available number of known pathogenic variants for training. Existing prediction methods largely use manually encoded features including conservation, protein structure and function, and allele frequency to infer variant pathogenicity. Recent advances in deep learning modeling of protein sequences and structures provide an opportunity to improve the representation of salient features based on large numbers of protein sequences. We developed a new pathogenicity predictor for SHort Inframe iNsertion and dEletion (SHINE). SHINE uses pretrained protein language models to construct a latent representation of an indel and its protein context from protein sequences and multiple protein sequence alignments, and feeds the latent representation into supervised machine learning models for pathogenicity prediction. We curated training data from ClinVar and gnomAD, and created two test datasets from different sources. SHINE achieved better prediction performance than existing methods for both deletion and insertion variants in these two test datasets. Our work suggests that unsupervised protein language models can provide valuable information about proteins, and new methods based on these models can improve variant interpretation in genetic analyses.
Collapse
Affiliation(s)
- Xiao Fan
- Department of Pediatrics, Columbia University, New York, NY, USA
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Hongbing Pan
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Alan Tian
- Lynbrook High School, San Jose, CA, USA
| | - Wendy K Chung
- Department of Pediatrics, Columbia University, New York, NY, USA
- Department of Medicine, Columbia University, New York, NY, USA
| | - Yufeng Shen
- Department of Systems Biology, Columbia University, New York, NY, USA
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
- JP Sulzberger Columbia Genome Center, Columbia University, New York, NY, USA
| |
Collapse
|
3
|
Jan A, Hayat M, Wedyan M, Alturki R, Gazzawe F, Ali H, Alarfaj FK. Target-AMP: Computational prediction of antimicrobial peptides by coupling sequential information with evolutionary profile. Comput Biol Med 2022; 151:106311. [PMID: 36410097 DOI: 10.1016/j.compbiomed.2022.106311] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 11/02/2022] [Accepted: 11/13/2022] [Indexed: 11/18/2022]
Abstract
Antimicrobial peptides (AMPs) are gaining a lot of attention as cutting-edge treatments for many infectious disorders. The effectiveness of AMPs against bacteria, fungi, and viruses has persisted for a long period, making them the greatest option for addressing the growing problem of antibiotic resistance. Due to their wide-ranging actions, AMPs have become more prominent, particularly in therapeutic applications. The prediction of AMPs has become a difficult task for academics due to the explosive increase of AMPs documented in databases. Wet-lab investigations to find anti-microbial peptides are exceedingly costly, time-consuming, and even impossible for some species. Therefore, in order to choose the optimal AMPs candidate before to the in-vitro trials, an efficient computational method must be developed. In this study, an effort was made to develop a machine learning-based classification system that is effective, accurate, and can distinguish between anti-microbial peptides. The position-specific-scoring-matrix (PSSM), Pseudo Amino acid composition, di-peptide composition, and combination of these three were utilized in the suggested scheme to extract salient aspects from AMPs sequences. The classification techniques K-nearest neighbor (KNN), Random Forest (RF), and Support Vector Machine (SVM) were employed. On the independent dataset and training dataset, the accuracy levels achieved by the suggested predictor (Target-AMP) are 97.07% and 95.71%, respectively. The results show that, when compared to other techniques currently used in the literature, our Target-AMP had the best success rate.
Collapse
Affiliation(s)
- Asad Jan
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
| | - Mohammad Wedyan
- Department of Autonomous Systems, Faculty of Artificial Intelligence, Al-Balqa Applied University, Al-Salt, 19117, Jordan
| | - Ryan Alturki
- Department of Information Science, College of Computer and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Foziah Gazzawe
- Department of Information Science, College of Computer and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Hashim Ali
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Fawaz Khaled Alarfaj
- College of Computer & Information Technology, King Faisal University, Saudi Arabia
| |
Collapse
|
4
|
Xia Q, Shu Z, Ye T, Zhang M. Identification and Analysis of the Blood lncRNA Signature for Liver Cirrhosis and Hepatocellular Carcinoma. Front Genet 2020; 11:595699. [PMID: 33365048 PMCID: PMC7750531 DOI: 10.3389/fgene.2020.595699] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Accepted: 10/13/2020] [Indexed: 12/12/2022] Open
Abstract
As one of the most common malignant tumors, hepatocellular carcinoma (HCC) is the fifth major cause of cancer-associated mortality worldwide. In 90% of cases, HCC develops in the context of liver cirrhosis and chronic hepatitis B virus (HBV) infection is an important etiology for cirrhosis and HCC, accounting for 53% of all HCC cases. To understand the underlying mechanisms of the dynamic chain reactions from normal to HBV infection, from HBV infection to liver cirrhosis, from liver cirrhosis to HCC, we analyzed the blood lncRNA expression profiles from 38 healthy control samples, 45 chronic hepatitis B patients, 46 liver cirrhosis patients, and 46 HCC patients. Advanced machine-learning methods including Monte Carlo feature selection, incremental feature selection (IFS), and support vector machine (SVM) were applied to discover the signature associated with HCC progression and construct the prediction model. One hundred seventy-one key HCC progression-associated lncRNAs were identified and their overall accuracy was 0.823 as evaluated with leave-one-out cross validation (LOOCV). The accuracies of the lncRNA signature for healthy control, chronic hepatitis B, liver cirrhosis, and HCC were 0.895, 0.711, 0.870, and 0.826, respectively. The 171-lncRNA signature is not only useful for early detection and intervention of HCC, but also helpful for understanding the multistage tumorigenic processes of HCC.
Collapse
Affiliation(s)
- Qi Xia
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China.,Key Laboratory for Biomedical Engineering of Ministry of Education, Zhejiang University, Hangzhou, China.,Zhejiang University, Hangzhou, China
| | - Zheyue Shu
- Zhejiang University, Hangzhou, China.,Division of Hepatobiliary and Pancreatic Surgery, Department of Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China.,Key Laboratory of Combined Multi-Organ Transplantation, Ministry of Public Health, Hangzhou, China
| | - Ting Ye
- Zhejiang University, Hangzhou, China
| | - Min Zhang
- Zhejiang University, Hangzhou, China.,Division of Hepatobiliary and Pancreatic Surgery, Department of Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China.,Key Laboratory of Combined Multi-Organ Transplantation, Ministry of Public Health, Hangzhou, China
| |
Collapse
|
5
|
Wu Z, Shou L, Wang J, Huang T, Xu X. The Methylation Pattern for Knee and Hip Osteoarthritis. Front Cell Dev Biol 2020; 8:602024. [PMID: 33240895 PMCID: PMC7677303 DOI: 10.3389/fcell.2020.602024] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2020] [Accepted: 10/22/2020] [Indexed: 01/08/2023] Open
Abstract
Osteoarthritis is one of the most prevalent chronic joint diseases for middle-aged and elderly people. But in recent years, the number of young people suffering from the disease increases quickly. It is known that osteoarthritis is a common degenerative disease caused by the combination and interaction of many factors such as natural and environmental factors. DNA methylations reflect the effects of environmental factors. Several researches on DNA methylation at specific genes in OA cartilage indicated the great potential roles of DNA methylation in OA. To systematically investigate the methylation pattern in knee and hip osteoarthritis, we analyzed the methylation profiles in cartilage of 16 OA hip samples, 19 control hip samples and 62 OA knee samples. 12 discriminative methylation sites were identified using advanced minimal Redundancy Maximal Relevance (mRMR) and Incremental Feature Selection (IFS) methods. The SVM classifier of these 12 methylation sites from genes like MEIS1, GABRG3, RXRA, and EN1, can perfectly classify the OA hip samples, control hip samples and OA knee samples evaluated with LOOCV (Leave-One Out-Cross Validation). These 12 methylation sites can not only serve as biomarker, but also provide underlying mechanism of OA.
Collapse
Affiliation(s)
- Zhen Wu
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Lu Shou
- Departmemt of Pneumology, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Jian Wang
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai, China
| | - Xinwei Xu
- Departmemt of Orthopaedics, Tongde Hospital of Zhejiang Province, Hangzhou, China
| |
Collapse
|
6
|
Cheng Q, Li J, Fan F, Cao H, Dai ZY, Wang ZY, Feng SS. Identification and Analysis of Glioblastoma Biomarkers Based on Single Cell Sequencing. Front Bioeng Biotechnol 2020; 8:167. [PMID: 32195242 PMCID: PMC7066068 DOI: 10.3389/fbioe.2020.00167] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Accepted: 02/19/2020] [Indexed: 12/16/2022] Open
Abstract
Glioblastoma (GBM) is one of the most common and aggressive primary adult brain tumors. Tumor heterogeneity poses a great challenge to the treatment of GBM, which is determined by both heterogeneous GBM cells and a complex tumor microenvironment. Single-cell RNA sequencing (scRNA-seq) enables the transcriptomes of great deal of individual cells to be assayed in an unbiased manner and has been applied in head and neck cancer, breast cancer, blood disease, and so on. In this study, based on the scRNA-seq results of infiltrating neoplastic cells in GBM, computational methods were applied to screen core biomarkers that can distinguish the discrepancy between GBM tumor and pericarcinomatous environment. The gene expression profiles of GBM from 2343 tumor cells and 1246 periphery cells were analyzed by maximum relevance minimum redundancy (mRMR). Upon further analysis of the feature lists yielded by the mRMR method, 31 important genes were extracted that may be essential biomarkers for GBM tumor cells. Besides, an optimal classification model using a support vector machine (SVM) algorithm as the classifier was also built. Our results provided insights of GBM mechanisms and may be useful for GBM diagnosis and therapy.
Collapse
Affiliation(s)
- Quan Cheng
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China.,Department of Clinical Pharmacology, Xiangya Hospital, Central South University, Changsha, China
| | - Jing Li
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Fan Fan
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Hui Cao
- Department of Psychiatry, The Second People's Hospital of Hunan University of Chinese Medicine, Changsha, China
| | - Zi-Yu Dai
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Ze-Yu Wang
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Song-Shan Feng
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| |
Collapse
|
7
|
Yao Y, Gu Y, Yang M, Cao D, Wu F. The Gene Expression Biomarkers for Chronic Obstructive Pulmonary Disease and Interstitial Lung Disease. Front Genet 2019; 10:1154. [PMID: 31824564 PMCID: PMC6879656 DOI: 10.3389/fgene.2019.01154] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2019] [Accepted: 10/22/2019] [Indexed: 01/01/2023] Open
Abstract
COPD (chronic obstructive pulmonary disease) and ILD (interstitial lung disease) are two common respiratory diseases. They share similar clinical traits but require different therapeutic treatments. Identifying the biomarkers that are differentially expressed between them will not only help the diagnosis of COPD and ILD, but also provide candidate drug targets that may facilitate the development of new treatment for COPD and ILD. Due to the irreversible complex pathological changes of COPD, there are very limited therapeutic options for COPD patients. In this study, we analyzed the gene expression profiles of two datasets: one training dataset that includes 144 COPD patients and 194 ILD patients, and one test dataset that includes 75 COPD patients and 61 ILD patients. Advanced feature selection methods, mRMR (minimal Redundancy Maximal Relevance) and incremental feature selection (IFS), were applied to identify the 38-gene biomarker. An SVM (support vector machine) classifier was built based on the 38-gene biomarker. Its accuracy, sensitivity, and specificity on training dataset evaluated by leave one out cross-validation were 0.905, 0.896, and 0.912, respectively. And on independent test dataset, the accuracy, sensitivity, and specificity on were as great as and were 0.904, 0.933, and 0.869, respectively. The biological function analysis of the 38 genes indicated that many of them can be potential treatment targets that may benefit COPD and ILD patients.
Collapse
Affiliation(s)
- Yangwei Yao
- Department of Pulmonary and Critical Care Medicine, The Second Hospital of Jiaxing, Jiaxing, China
| | - Yangyang Gu
- Department of Pulmonary and Critical Care Medicine, The Second Hospital of Jiaxing, Jiaxing, China
| | - Meng Yang
- Department of Pulmonary and Critical Care Medicine, The Second Hospital of Jiaxing, Jiaxing, China
| | - Dakui Cao
- Department of Pulmonary and Critical Care Medicine, The Second Hospital of Jiaxing, Jiaxing, China
| | - Fengjie Wu
- Department of Pulmonary and Critical Care Medicine, The Second Hospital of Jiaxing, Jiaxing, China
| |
Collapse
|
8
|
Chen L, Li D, Shao Y, Wang H, Liu Y, Zhang Y. Identifying Microbiota Signature and Functional Rules Associated With Bacterial Subtypes in Human Intestine. Front Genet 2019; 10:1146. [PMID: 31803234 PMCID: PMC6872643 DOI: 10.3389/fgene.2019.01146] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 10/21/2019] [Indexed: 12/12/2022] Open
Abstract
Gut microbiomes are integral microflora located in the human intestine with particular symbiosis. Among all microorganisms in the human intestine, bacteria are the most significant subgroup that contains many unique and functional species. The distribution patterns of bacteria in the human intestine not only reflect the different microenvironments in different sections of the intestine but also indicate that bacteria may have unique biological functions corresponding to their proper regions of the intestine. However, describing the functional differences between the bacterial subgroups and their distributions in different individuals is difficult using traditional computational approaches. Here, we first attempted to introduce four effective sets of bacterial features from independent databases. We then presented a novel computational approach to identify potential distinctive features among bacterial subgroups based on a systematic dataset on the gut microbiome from approximately 1,500 human gut bacterial strains. We also established a group of quantitative rules for explaining such distinctions. Results may reveal the microstructural characteristics of the intestinal flora and deepen our understanding on the regulatory role of bacterial subgroups in the human intestine.
Collapse
Affiliation(s)
- Lijuan Chen
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Daojie Li
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Ye Shao
- School of Medicine, Huaqiao University, Quanzhou, China
| | - Hui Wang
- College of Animal Science and Technology, Anhui Agricultural University, Hefei, China
| | - Yuqing Liu
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, China
| | - Yunhua Zhang
- Anhui Province Key Laboratory of Farmland Ecological Conservation and Pollution Prevention, School of Resources and Environment, Anhui Agricultural University, Hefei, China
| |
Collapse
|
9
|
Zhang GL, Pan LL, Huang T, Wang JH. The transcriptome difference between colorectal tumor and normal tissues revealed by single-cell sequencing. J Cancer 2019; 10:5883-5890. [PMID: 31737124 PMCID: PMC6843882 DOI: 10.7150/jca.32267] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 06/17/2019] [Indexed: 12/29/2022] Open
Abstract
The previous cancer studies were difficult to reproduce since the tumor tissues were analyzed directly. But the tumor tissues were actually a mixture of different cancer cells. The transcriptome of single-cell was much robust than the transcriptome of a mixed tissue. The single-cell transcriptome had much smaller variance. In this study, we analyzed the single-cell transcriptome of 272 colorectal cancer (CRC) epithelial cells and 160 normal epithelial cells and identified 342 discriminative transcripts using advanced machine learning methods. The most discriminative transcripts were LGALS4, PHGR1, C15orf48, HEPACAM2, PERP, FABP1, FCGBP, MT1G, TSPAN1 and CKB. We further clustered the 342 transcripts into two categories. The upregulated transcripts in CRC epithelial cells were significantly enriched in Ribosome, Protein processing in endoplasmic reticulum, Antigen processing and presentation and p53 signaling pathway. The downregulated transcripts in CRC epithelial cells were significantly enriched in Mineral absorption, Aldosterone-regulated sodium reabsorption and Oxidative phosphorylation pathways. The biological analysis of the discriminative transcripts revealed the possible mechanism of colorectal cancer.
Collapse
Affiliation(s)
- Guo-Liang Zhang
- Department of Colorectal Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou 310003, Zhejiang, China
| | - Le-Lin Pan
- Department of Colorectal Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou 310003, Zhejiang, China
| | - Tao Huang
- Shanghai Institute of Nutrition and Health, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
| | - Jin-Hai Wang
- Department of Colorectal Surgery, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou 310003, Zhejiang, China
| |
Collapse
|
10
|
Pagel KA, Antaki D, Lian A, Mort M, Cooper DN, Sebat J, Iakoucheva LM, Mooney SD, Radivojac P. Pathogenicity and functional impact of non-frameshifting insertion/deletion variation in the human genome. PLoS Comput Biol 2019; 15:e1007112. [PMID: 31199787 PMCID: PMC6594643 DOI: 10.1371/journal.pcbi.1007112] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2018] [Revised: 06/26/2019] [Accepted: 05/17/2019] [Indexed: 11/19/2022] Open
Abstract
Differentiation between phenotypically neutral and disease-causing genetic variation remains an open and relevant problem. Among different types of variation, non-frameshifting insertions and deletions (indels) represent an understudied group with widespread phenotypic consequences. To address this challenge, we present a machine learning method, MutPred-Indel, that predicts pathogenicity and identifies types of functional residues impacted by non-frameshifting insertion/deletion variation. The model shows good predictive performance as well as the ability to identify impacted structural and functional residues including secondary structure, intrinsic disorder, metal and macromolecular binding, post-translational modifications, allosteric sites, and catalytic residues. We identify structural and functional mechanisms impacted preferentially by germline variation from the Human Gene Mutation Database, recurrent somatic variation from COSMIC in the context of different cancers, as well as de novo variants from families with autism spectrum disorder. Further, the distributions of pathogenicity prediction scores generated by MutPred-Indel are shown to differentiate highly recurrent from non-recurrent somatic variation. Collectively, we present a framework to facilitate the interrogation of both pathogenicity and the functional effects of non-frameshifting insertion/deletion variants. The MutPred-Indel webserver is available at http://mutpred.mutdb.org/.
Collapse
Affiliation(s)
- Kymberleigh A. Pagel
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, United States of America
| | - Danny Antaki
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - AoJie Lian
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
- Center for Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Matthew Mort
- Institute of Medical Genetics, Cardiff University, Cardiff, United Kingdom
| | - David N. Cooper
- Institute of Medical Genetics, Cardiff University, Cardiff, United Kingdom
| | - Jonathan Sebat
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Lilia M. Iakoucheva
- Department of Psychiatry, University of California San Diego, La Jolla, California, United States of America
| | - Sean D. Mooney
- Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, United States of America
| | - Predrag Radivojac
- School of Informatics, Computing, and Engineering, Indiana University, Bloomington, Indiana, United States of America
- Khoury College of Computer Sciences, Northeastern University, Boston, Massachusetts, United States of America
| |
Collapse
|
11
|
Deng M, Lv XD, Fang ZX, Xie XS, Chen WY. The blood transcriptional signature for active and latent tuberculosis. Infect Drug Resist 2019; 12:321-328. [PMID: 30787624 PMCID: PMC6363485 DOI: 10.2147/idr.s184640] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Although the incidence of tuberculosis (TB) has dropped substantially, it still is a serious threat to human health. And in recent years, the emergence of resistant bacilli and inadequate disease control and prevention has led to a significant rise in the global TB epidemic. It is known that the cause of TB is Mycobacterium tuberculosis infection. But it is not clear why some infected patients are active while others are latent. METHODS We analyzed the blood gene expression profiles of 69 latent TB patients and 54 active pulmonary TB patients from GEO (Transcript Expression Omnibus) database. RESULTS By applying minimal redundancy maximal relevance and incremental feature selection, we identified 24 signature genes which can predict the TB activation. The support vector machine predictor based on these 24 genes had a sensitivity of 0.907, specificity of 0.913, and accuracy of 0.911, respectively. Although they need to be validated in a large independent dataset, the biological analysis of these 24 genes showed great promise. CONCLUSION We found that cytokine production was a key process during TB activation and genes like CYBB, TSPO, CD36, and STAT1 worth further investigation.
Collapse
Affiliation(s)
- Min Deng
- Department of Infectious Diseases, The First Hospital of Jiaxing, The First Affiliated Hospital of Jiaxing University, Jiaxing 314000, China,
| | - Xiao-Dong Lv
- Department of Respiration, The First Hospital of Jiaxing, The First Affiliated Hospital of Jiaxing University, Jiaxing 314000, China
| | - Zhi-Xian Fang
- Department of Respiration, The First Hospital of Jiaxing, The First Affiliated Hospital of Jiaxing University, Jiaxing 314000, China
| | - Xin-Sheng Xie
- Department of Infectious Diseases, The First Hospital of Jiaxing, The First Affiliated Hospital of Jiaxing University, Jiaxing 314000, China,
| | - Wen-Yu Chen
- Department of Respiration, The First Hospital of Jiaxing, The First Affiliated Hospital of Jiaxing University, Jiaxing 314000, China
| |
Collapse
|
12
|
Sheng M, Dong Z, Xie Y. Identification of tumor-educated platelet biomarkers of non-small-cell lung cancer. Onco Targets Ther 2018; 11:8143-8151. [PMID: 30532555 PMCID: PMC6241732 DOI: 10.2147/ott.s177384] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Lung cancer is a severe cancer with a high death rate. The 5-year survival rate for stage III lung cancer is much lower than stage I. Early detection and intervention of lung cancer patients can significantly increase their survival time. However, conventional lung cancer-screening methods, such as chest X-rays, sputum cytology, positron-emission tomography (PET), low-dose computed tomography (CT), magnetic resonance imaging, and gene-mutation, -methylation, and -expression biomarkers of lung tissue, are invasive, radiational, or expensive. Liquid biopsy is non-invasive and does little harm to the body. It can reflect early-stage dysfunctions of tumorigenesis and enable early detection and intervention. METHODS In this study, we analyzed RNA-sequencing data of tumor-educated platelets (TEPs) in 402 non-small-cell lung cancer (NSCLC) patients and 231 healthy controls. A total of 48 biomarker genes were selected with advanced minimal-redundancy, maximal-relevance, and incremental feature-selection (IFS) methods. RESULTS A support vector-machine (SVM) classifier based on the 48 biomarker genes accurately predicted NSCLC with leave-one-out cross-validation (LOOCV) sensitivity, specificity, accuracy, and Matthews correlation coefficients of 0.925, 0.827, 0.889, and 0.760, respectively. Network analysis of the 48 genes revealed that the WASF1 actin cytoskeleton module, PRKAB2 kinase module, RSRC1 ribosomal protein module, PDHB carbohydrate-metabolism module, and three intermodule hubs (TPM2, MYL9, and PPP1R12C) may play important roles in NSCLC tumorigenesis and progression. CONCLUSION The 48-gene TEP liquid-biopsy biomarkers will facilitate early screening of NSCLC and prolong the survival of cancer patients.
Collapse
Affiliation(s)
- Meiling Sheng
- Department of Respiration, Jinhua People's Hospital, Jinhua, Zhejiang 321000, China
| | - Zhaohui Dong
- Department of Intensive Care Unit, First Hospital of Huzhou, First Affiliated Hospital of Huzhou University, Huzhou, Zhejiang 313000, China
| | - Yanping Xie
- Department of Respiratory Medicine, First Hospital of Huzhou, First Affiliated Hospital of Huzhou University, Huzhou, Zhejiang 313000, China,
| |
Collapse
|
13
|
The early detection of asthma based on blood gene expression. Mol Biol Rep 2018; 46:217-223. [PMID: 30421126 DOI: 10.1007/s11033-018-4463-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Accepted: 11/01/2018] [Indexed: 01/10/2023]
Abstract
Asthma is a complex heterogeneous disorder with hereditary tendency and the most widely used therapy is inhalation of anti-inflammatory corticosteroids. But it has systemic side effects. If the chronic inflammation can be detected in early stage, the dosage of corticosteroids will be low and the side effects can be avoided. Therefore, to discover the early stage blood biomarkers for asthma, we analyzed the gene expression profiles in the blood of 77 moderate asthma patients and 87 healthy controls. With advanced feature selection methods, minimal Redundancy Maximal Relevance and Incremental Feature Selection, we identified 31 genes, such as MYD88, ZFP36, CCR3 and CYP3A5, as the optimal asthma biomarker. The sensitivity, specificity and accuracy of the 31-gene Support Vector Machine predictor evaluated with Leave-One-Out Cross Validation were 0.870, 0.816 and 0.841, respectively. Through literature survey, many biomarker genes have asthma associated functions. Our results not only provided the easy-to-apply blood gene expression biomarkers for early detection of asthma, but also an explainable qualitative model with biological significance.
Collapse
|
14
|
Lin H, Qiu X, Zhang B, Zhang J. Identification of the predictive genes for the response of colorectal cancer patients to FOLFOX therapy. Onco Targets Ther 2018; 11:5943-5955. [PMID: 30271178 PMCID: PMC6149834 DOI: 10.2147/ott.s167656] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Background Colorectal cancer is a malignant tumor with high death rate. Chemotherapy, radiotherapy and surgery are the three common treatments of colorectal cancer. For early colorectal cancer patients, postoperative adjuvant chemotherapy can reduce the risk of recurrence. For advanced colorectal cancer patients, palliative chemotherapy can significantly improve the life quality of patients and prolong survival. FOLFOX is one of the mainstream chemotherapies in colorectal cancer, however, its response rate is only about 50%. Methods To systematically investigate why some of the colorectal cancer patients have response to FOLFOX therapy while others do not, we searched all publicly available database and combined three gene expression datasets of colorectal cancer patients with FOLFOX therapy. With advanced minimal redundancy maximal relevance and incremental feature selection method, we identified the biomarker genes. Results A Support Vector Machine-based classifier was constructed to predict the response of colorectal cancer patients to FOLFOX therapy. Its accuracy, sensitivity and specificity were 0.854, 0.845 and 0.863, respectively. Conclusion The biological analysis of representative biomarker genes suggested that apoptosis and inflammation signaling pathways were essential for the response of colorectal cancer patients to FOLFOX chemotherapy.
Collapse
Affiliation(s)
- Hengjun Lin
- Department of Tumor, Anus and Intestine, Jinhua People's Hospital, Jinhua, Zhejiang 321000, China,
| | - Xueke Qiu
- Department of Tumor, Anus and Intestine, Jinhua People's Hospital, Jinhua, Zhejiang 321000, China,
| | - Bo Zhang
- Department of Tumor, Anus and Intestine, Jinhua People's Hospital, Jinhua, Zhejiang 321000, China,
| | - Jichao Zhang
- Department of Tumor, Anus and Intestine, Jinhua People's Hospital, Jinhua, Zhejiang 321000, China,
| |
Collapse
|
15
|
Pan X, Hu X, Zhang YH, Chen L, Zhu L, Wan S, Huang T, Cai YD. Identification of the copy number variant biomarkers for breast cancer subtypes. Mol Genet Genomics 2018; 294:95-110. [PMID: 30203254 DOI: 10.1007/s00438-018-1488-4] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 09/03/2018] [Indexed: 01/07/2023]
Abstract
Breast cancer is a common and threatening malignant disease with multiple biological and clinical subtypes. It can be categorized into subtypes of luminal A, luminal B, Her2 positive, and basal-like. Copy number variants (CNVs) have been reported to be a potential and even better biomarker for cancer diagnosis than mRNA biomarkers, because it is considerably more stable and robust than gene expression. Thus, it is meaningful to detect CNVs of different cancers. To identify the CNV biomarker for breast cancer subtypes, we integrated the CNV data of more than 2000 samples from two large breast cancer databases, METABRIC and The Cancer Genome Atlas (TCGA). A Monte Carlo feature selection-based and incremental feature selection-based computational method was proposed and tested to identify the distinctive core CNVs in different breast cancer subtypes. We identified the CNV genes that may contribute to breast cancer tumorigenesis as well as built a set of quantitative distinctive rules for recognition of the breast cancer subtypes. The tenfold cross-validation Matthew's correlation coefficient (MCC) on METABRIC training set and the independent test on TCGA dataset were 0.515 and 0.492, respectively. The CNVs of PGAP3, GRB7, MIR4728, PNMT, STARD3, TCAP and ERBB2 were important for the accurate diagnosis of breast cancer subtypes. The findings reported in this study may further uncover the difference between different breast cancer subtypes and improve the diagnosis accuracy.
Collapse
Affiliation(s)
- Xiaoyong Pan
- College of Life Science, Shanghai University, Shanghai, 200444, People's Republic of China.,Department of Medical Informatics, Erasmus MC, Rotterdam, The Netherlands
| | - XiaoHua Hu
- Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, 200438, People's Republic of China
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China.,Shanghai Key Laboratory of PMMP, East China Normal University, Shanghai, 200241, People's Republic of China
| | - LiuCun Zhu
- College of Life Science, Shanghai University, Shanghai, 200444, People's Republic of China
| | - ShiBao Wan
- College of Life Science, Shanghai University, Shanghai, 200444, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China.
| | - Yu-Dong Cai
- College of Life Science, Shanghai University, Shanghai, 200444, People's Republic of China.
| |
Collapse
|
16
|
Li J, Lan CN, Kong Y, Feng SS, Huang T. Identification and Analysis of Blood Gene Expression Signature for Osteoarthritis With Advanced Feature Selection Methods. Front Genet 2018; 9:246. [PMID: 30214455 PMCID: PMC6125376 DOI: 10.3389/fgene.2018.00246] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2018] [Accepted: 06/22/2018] [Indexed: 12/15/2022] Open
Abstract
Osteoarthritis (OA) is a complex disease that affects articular joints and may cause disability. The incidence of OA is extremely high. Most elderly people have the symptoms of osteoarthritis. The physiotherapy of OA is time consuming, and the chances of full recovery from OA are very minimal. The most effective way of fighting OA is early diagnosis and early intervention. Liquid biopsy has become a popular noninvasive test. To find the blood gene expression signature for OA, we reanalyzed the publicly available blood gene expression profiles of 106 patients with OA and 33 control samples using an automatic computational pipeline based on advanced feature selection methods. Finally, a compact 23-gene set was identified. On the basis of these 23 genes, we constructed a Support Vector Machine (SVM) classifier and evaluated it with leave-one-out cross-validation. Its sensitivity (Sn), specificity (Sp), accuracy (ACC), and Mathew's correlation coefficient (MCC) were 0.991, 0.909, 0.971, and 0.920, respectively. Obviously, the performance needed to be validated in an independent large dataset, but the in-depth biological analysis of the 23 biomarkers showed great promise and suggested that mRNA surveillance pathway and multicellular organism growth played important roles in OA. Our results shed light on OA diagnosis through liquid biopsy.
Collapse
Affiliation(s)
- Jing Li
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Chun-Na Lan
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Ying Kong
- Department of Rehabilitation, The Second Xiangya Hospital, Central South University, Changsha, China
| | - Song-Shan Feng
- Department of Neurosurgery, Xiangya Hospital, Central South University, Changsha, China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| |
Collapse
|
17
|
Niu M, Li Y, Wang C, Han K. RFAmyloid: A Web Server for Predicting Amyloid Proteins. Int J Mol Sci 2018; 19:ijms19072071. [PMID: 30013015 PMCID: PMC6073578 DOI: 10.3390/ijms19072071] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2018] [Revised: 07/10/2018] [Accepted: 07/12/2018] [Indexed: 12/22/2022] Open
Abstract
Amyloid is an insoluble fibrous protein and its mis-aggregation can lead to some diseases, such as Alzheimer’s disease and Creutzfeldt–Jakob’s disease. Therefore, the identification of amyloid is essential for the discovery and understanding of disease. We established a novel predictor called RFAmy based on random forest to identify amyloid, and it employed SVMProt 188-D feature extraction method based on protein composition and physicochemical properties and pse-in-one feature extraction method based on amino acid composition, autocorrelation pseudo acid composition, profile-based features and predicted structures features. In the ten-fold cross-validation test, RFAmy’s overall accuracy was 89.19% and F-measure was 0.891. Results were obtained by comparison experiments with other feature, classifiers, and existing methods. This shows the effectiveness of RFAmy in predicting amyloid protein. The RFAmy proposed in this paper can be accessed through the URL http://server.malab.cn/RFAmyloid/.
Collapse
Affiliation(s)
- Mengting Niu
- School of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.
| | - Yanjuan Li
- School of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China.
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150040, China.
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin 150040, China.
| |
Collapse
|
18
|
Zhang TM, Huang T, Wang RF. Cross talk of chromosome instability, CpG island methylator phenotype and mismatch repair in colorectal cancer. Oncol Lett 2018; 16:1736-1746. [PMID: 30008861 PMCID: PMC6036478 DOI: 10.3892/ol.2018.8860] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2017] [Accepted: 05/22/2018] [Indexed: 12/20/2022] Open
Abstract
Colorectal cancer is a severe cancer associated with a high prevalence and fatality rate. There are three major mechanisms for colorectal cancer: (1) Chromosome instability (CIN), (2) CpG island methylator phenotype (CIMP) and (3) mismatch repair (MMR), of which CIN is the most common type. However, these subtypes are not exclusive and overlap. To investigate their biological mechanisms and cross talk, the gene expression profiles of 585 colorectal cancer patients with CIN, CIMP and MMR status records were collected. By comparing the CIN+ and CIN-samples, CIMP+ and CIMP-samples, MMR+ and MMR-samples with minimal redundancy maximal relevance (mRMR) and incremental feature selection (IFS) methods, the CIN, CIMP and MMR associated genes were selected. Unfortunately, there was little direct overlap among them. To investigate their indirect interactions, downstream genes of CIN, CIMP and MMR were identified using the random walk with restart (RWR) method and a greater overlap of downstream genes was indicated. The common downstream genes were involved in biosynthetic and metabolic pathways. These findings were consistent with the clinical observation of wide range metabolite aberrations in colorectal cancer. To conclude, the present study gave a gene level explanation of CIN, CIMP and MMR, but also showed the network level cross talk of CIN, CIMP and MMR. The common genes of CIN, CIMP and MMR may be useful for cross-subtype general colorectal cancer drug development.
Collapse
Affiliation(s)
- Tian-Ming Zhang
- Department of Colorectal and Anal Surgery, Jinhua Hospital of Zhejiang University, Jinhua, Zhejiang 321000, P.R. China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, P.R. China
| | - Rong-Fei Wang
- Department of Colorectal and Anal Surgery, Jinhua People's Hospital, Jinhua, Zhejiang 321000, P.R. China
| |
Collapse
|
19
|
Huang T, Shu Y, Cai YD. Genetic differences among ethnic groups. BMC Genomics 2015; 16:1093. [PMID: 26690364 PMCID: PMC4687076 DOI: 10.1186/s12864-015-2328-0] [Citation(s) in RCA: 108] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Accepted: 12/15/2015] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Many differences between different ethnic groups have been observed, such as skin color, eye color, height, susceptibility to some diseases, and response to certain drugs. However, the genetic bases of such differences have been under-investigated. Since the HapMap project, large-scale genotype data from Caucasian, African and Asian population samples have been available. The project found that these populations were located in different areas of the PCA (Principal Component Analysis) plot. However, as an unsupervised method, PCA does not measure the differences in each single nucleotide polymorphism (SNP) among populations. RESULTS We applied an advanced mutual information-based feature selection method to detect associations between SNP status and ethnic groups using the latest HapMap Phase 3 release version 3, which included more sub-populations. A total of 299 SNPs were identified, and they can accurately predicted the ethnicity of all HapMap populations. The 10-fold cross validation accuracy of the SMO (sequential minimal optimization) model on training dataset was 0.901, and the accuracy on independent test dataset was 0.895. CONCLUSIONS In-depth functional analysis of these SNPs and their nearby genes revealed the genetic bases of skin and eye color differences among populations.
Collapse
Affiliation(s)
- Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, P. R. China.
| | - Yang Shu
- Sate Key Laboratory of Biotherapy, Sichuan University, Sichuan, 610041, P. R. China.
| | - Yu-Dong Cai
- College of Life Science, Shanghai University, Shanghai, 200444, P. R. China.
| |
Collapse
|
20
|
Computational approaches to study the effects of small genomic variations. J Mol Model 2015; 21:251. [PMID: 26350246 DOI: 10.1007/s00894-015-2794-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2015] [Accepted: 08/23/2015] [Indexed: 10/23/2022]
Abstract
Advances in DNA sequencing technologies have led to an avalanche-like increase in the number of gene sequences deposited in public databases over the last decade as well as the detection of an enormous number of previously unseen nucleotide variants therein. Given the size and complex nature of the genome-wide sequence variation data, as well as the rate of data generation, experimental characterization of the disease association of each of these variations or their effects on protein structure/function would be costly, laborious, time-consuming, and essentially impossible. Thus, in silico methods to predict the functional effects of sequence variations are constantly being developed. In this review, we summarize the major computational approaches and tools that are aimed at the prediction of the functional effect of mutations, and describe the state-of-the-art databases that can be used to obtain information about mutation significance. We also discuss future directions in this highly competitive field.
Collapse
|