Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Macintyre G, Jimeno Yepes A, Ong CS, Verspoor K. Associating disease-related genetic variants in intergenic regions to the genes they impact. PeerJ 2014;2:e639. [PMID: 25374782 PMCID: PMC4217187 DOI: 10.7717/peerj.639] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 10/07/2014] [Indexed: 11/20/2022] Open

For:	Macintyre G, Jimeno Yepes A, Ong CS, Verspoor K. Associating disease-related genetic variants in intergenic regions to the genes they impact. PeerJ 2014;2:e639. [PMID: 25374782 PMCID: PMC4217187 DOI: 10.7717/peerj.639] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 10/07/2014] [Indexed: 11/20/2022] Open

Number

Cited by Other Article(s)

Wang L, You ZH, Huang DS, Li JQ. MGRCDA: Metagraph Recommendation Method for Predicting CircRNA-Disease Association. IEEE TRANSACTIONS ON CYBERNETICS 2023;53:67-75. [PMID: 34236991 DOI: 10.1109/tcyb.2021.3090756] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Nguyen QH, Ngo HH, Nguyen-Vo TH, Do TT, Rahardja S, Nguyen BP. eMIC-AntiKP: Estimating minimum inhibitory concentrations of antibiotics towards Klebsiella pneumoniae using deep learning. Comput Struct Biotechnol J 2022;21:751-757. [PMID: 36659924 PMCID: PMC9827358 DOI: 10.1016/j.csbj.2022.12.041] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 12/22/2022] [Accepted: 12/23/2022] [Indexed: 12/27/2022] Open

Wang L, You ZH, Li JQ, Huang YA. IMS-CDA: Prediction of CircRNA-Disease Associations From the Integration of Multisource Similarity Information With Deep Stacked Autoencoder Model. IEEE TRANSACTIONS ON CYBERNETICS 2021;51:5522-5531. [PMID: 33027025 DOI: 10.1109/tcyb.2020.3022852] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Wang L, You ZH, Zhou X, Yan X, Li HY, Huang YA. NMFCDA: Combining randomization-based neural network with non-negative matrix factorization for predicting CircRNA-disease association. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107629] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]

The Identification of the SARS-CoV-2 Whole Genome: Nine Cases Among Patients in Banten Province, Indonesia. JOURNAL OF PURE AND APPLIED MICROBIOLOGY 2021. [DOI: 10.22207/jpam.15.2.52] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Wang L, Yan X, You ZH, Zhou X, Li HY, Huang YA. SGANRDA: semi-supervised generative adversarial networks for predicting circRNA-disease associations. Brief Bioinform 2021;22:6175330. [PMID: 33734296 DOI: 10.1093/bib/bbab028] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2020] [Revised: 01/18/2021] [Accepted: 01/19/2021] [Indexed: 12/31/2022] Open

Wang L, You ZH, Li YM, Zheng K, Huang YA. GCNCDA: A new method for predicting circRNA-disease associations based on Graph Convolutional Network Algorithm. PLoS Comput Biol 2020;16:e1007568. [PMID: 32433655 PMCID: PMC7266350 DOI: 10.1371/journal.pcbi.1007568] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 06/02/2020] [Accepted: 03/23/2020] [Indexed: 01/22/2023] Open

Abstract

Numerous evidences indicate that Circular RNAs (circRNAs) are widely involved in the occurrence and development of diseases. Identifying the association between circRNAs and diseases plays a crucial role in exploring the pathogenesis of complex diseases and improving the diagnosis and treatment of diseases. However, due to the complex mechanisms between circRNAs and diseases, it is expensive and time-consuming to discover the new circRNA-disease associations by biological experiment. Therefore, there is increasingly urgent need for utilizing the computational methods to predict novel circRNA-disease associations. In this study, we propose a computational method called GCNCDA based on the deep learning Fast learning with Graph Convolutional Networks (FastGCN) algorithm to predict the potential disease-associated circRNAs. Specifically, the method first forms the unified descriptor by fusing disease semantic similarity information, disease and circRNA Gaussian Interaction Profile (GIP) kernel similarity information based on known circRNA-disease associations. The FastGCN algorithm is then used to objectively extract the high-level features contained in the fusion descriptor. Finally, the new circRNA-disease associations are accurately predicted by the Forest by Penalizing Attributes (Forest PA) classifier. The 5-fold cross-validation experiment of GCNCDA achieved 91.2% accuracy with 92.78% sensitivity at the AUC of 90.90% on circR2Disease benchmark dataset. In comparison with different classifier models, feature extraction models and other state-of-the-art methods, GCNCDA shows strong competitiveness. Furthermore, we conducted case study experiments on diseases including breast cancer, glioma and colorectal cancer. The results showed that 16, 15 and 17 of the top 20 candidate circRNAs with the highest prediction scores were respectively confirmed by relevant literature and databases. These results suggest that GCNCDA can effectively predict potential circRNA-disease associations and provide highly credible candidates for biological experiments.

The recognition of circRNA-disease association is the key of disease diagnosis and treatment, and it is of great significance for exploring the pathogenesis of complex diseases. Computational methods can predict the potential disease-related circRNAs quickly and accurately. Based on the hypothesis that circRNA with similar function tends to associate with similar disease, GCNCDA model is proposed to effectively predict the potential association between circRNAs and diseases by combining FastGCN algorithm. The performance of the model was verified by cross-validation experiments, different feature extraction algorithm and classifier models comparison experiments. Furthermore, 16, 15 and 17 of the top 20 candidate circRNAs with the highest prediction scores in disease including breast cancer, glioma and colorectal cancer were respectively confirmed by relevant literature and databases. It is anticipated that GCNCDA model can give priority to the most promising circRNA-disease associations on a large scale to provide reliable candidates for further biological experiments.

Collapse

Wang L, You ZH, Huang YA, Huang DS, Chan KCC. An efficient approach based on multi-sources information to predict circRNA–disease associations using deep convolutional neural network. Bioinformatics 2019;36:4038-4046. [DOI: 10.1093/bioinformatics/btz825] [Citation(s) in RCA: 64] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2019] [Revised: 10/07/2019] [Accepted: 11/21/2019] [Indexed: 12/16/2022] Open

Abstract Abstract Motivation Emerging evidence indicates that circular RNA (circRNA) plays a crucial role in human disease. Using circRNA as biomarker gives rise to a new perspective regarding our diagnosing of diseases and understanding of disease pathogenesis. However, detection of circRNA–disease associations by biological experiments alone is often blind, limited to small scale, high cost and time consuming. Therefore, there is an urgent need for reliable computational methods to rapidly infer the potential circRNA–disease associations on a large scale and to provide the most promising candidates for biological experiments. Results In this article, we propose an efficient computational method based on multi-source information combined with deep convolutional neural network (CNN) to predict circRNA–disease associations. The method first fuses multi-source information including disease semantic similarity, disease Gaussian interaction profile kernel similarity and circRNA Gaussian interaction profile kernel similarity, and then extracts its hidden deep feature through the CNN and finally sends them to the extreme learning machine classifier for prediction. The 5-fold cross-validation results show that the proposed method achieves 87.21% prediction accuracy with 88.50% sensitivity at the area under the curve of 86.67% on the CIRCR2Disease dataset. In comparison with the state-of-the-art SVM classifier and other feature extraction methods on the same dataset, the proposed model achieves the best results. In addition, we also obtained experimental support for prediction results by searching published literature. As a result, 7 of the top 15 circRNA–disease pairs with the highest scores were confirmed by literature. These results demonstrate that the proposed model is a suitable method for predicting circRNA–disease associations and can provide reliable candidates for biological experiments. Availability and implementation The source code and datasets explored in this work are available at https://github.com/look0012/circRNA-Disease-association. Supplementary information Supplementary data are available at Bioinformatics online. Collapse

Alanazi IO, Al Shehri ZS, Ebrahimie E, Giahi H, Mohammadi-Dehcheshmeh M. Non-coding and coding genomic variants distinguish prostate cancer, castration-resistant prostate cancer, familial prostate cancer, and metastatic castration-resistant prostate cancer from each other. Mol Carcinog 2019;58:862-874. [PMID: 30644608 DOI: 10.1002/mc.22975] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 01/07/2019] [Accepted: 01/08/2019] [Indexed: 12/11/2022]

Abstract

A considerable number of deposited variants has provided new possibilities for knowledge discovery in different types of prostate cancer. Here, we analyzed variants located on 3'UTR, 5'UTR, CDs, Intergenic, and Intronic regions in castration-resistant prostate cancer (8496 variants), familial prostate cancer (3241 variants), metastatic castration-resistant prostate cancer (3693 variants), and prostate cancer (16599 variants). Chromosome regions 10p15-p14 and 2p13 were highly enriched (P < 0.00001) for variants located in 3'UTR, 5'UTR, CDs, intergenic, and intronic regions in castration-resistant prostate cancer. In contrast, 10p15-p14, 10q23.3, 12q13.11, 13q12.3, 1q25, and 8p22 regions were enriched (P < 0.001) in familial prostate cancer. In metastatic castration-resistant prostate cancer, 10p15-p14, 10q23.3, 11q22-q23, 14q21.1, and 14q32.13 were highly variant regions (P < 0.001). Chromosome 2 and chromosome 1 hosted many enriched variant regions. AKR1C3, BRCA1, BRCA2, CHGA, CYP19A1, HOXB13, KLK3, and PTEN contained the highest number of 3'UTR, 5'UTR, CDs, Intergenic, and Intronic variants. Network analysis showed that these genes are upstream of important functions including prostate gland development, tumor recurrence, prostate cancer-specific survival, tumor progression, cancer mortality, long-term survival, cancer recurrence, angiogenesis, and AR. Interestingly, all of EGFR, JAK2, NR3C1, PDZD2, and SEMA3C genes had single nucleotide polymorphisms (SNP) in castration-resistant prostate cancer, consistent with high selection pressure on these genes during drug treatment and consequent resistance. High occurrence of variants in 3'UTRs suggests the importance of regulatory variants in different types of prostate cancer; an area that has been neglected compared with coding variants. This study provides a comprehensive overview of genomic regions contributing to different types of prostate cancer.

Collapse

The rs13388259 Intergenic Polymorphism in the Genomic Context of the BCYRN1 Gene Is Associated with Parkinson's Disease in the Hungarian Population. PARKINSONS DISEASE 2018;2018:9351598. [PMID: 29850016 PMCID: PMC5903343 DOI: 10.1155/2018/9351598] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/18/2017] [Accepted: 03/12/2018] [Indexed: 11/17/2022]

Boudellioua I, Mahamad Razali RB, Kulmanov M, Hashish Y, Bajic VB, Goncalves-Serra E, Schoenmakers N, Gkoutos GV, Schofield PN, Hoehndorf R. Semantic prioritization of novel causative genomic variants. PLoS Comput Biol 2017;13:e1005500. [PMID: 28414800 PMCID: PMC5411092 DOI: 10.1371/journal.pcbi.1005500] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 05/01/2017] [Accepted: 04/04/2017] [Indexed: 12/14/2022] Open

Abstract

Discriminating the causative disease variant(s) for individuals with inherited or de novo mutations presents one of the main challenges faced by the clinical genetics community today. Computational approaches for variant prioritization include machine learning methods utilizing a large number of features, including molecular information, interaction networks, or phenotypes. Here, we demonstrate the PhenomeNET Variant Predictor (PVP) system that exploits semantic technologies and automated reasoning over genotype-phenotype relations to filter and prioritize variants in whole exome and whole genome sequencing datasets. We demonstrate the performance of PVP in identifying causative variants on a large number of synthetic whole exome and whole genome sequences, covering a wide range of diseases and syndromes. In a retrospective study, we further illustrate the application of PVP for the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism. We find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.

We address the problem of how to distinguish which of the many thousands of DNA sequence variants carried by an individual with a rare disease is responsible for the disease phenotypes. This can help clinicians arrive at a diagnosis, but also can be instrumental in improving our understanding of the pathobiology of the disease. Many methods are currently available to help with the problem of determining causative variant, using information about evolutionary conservation and prediction of the functional consequences of the sequence variant. We have developed a novel algorithm (PVP) which augments existing strategies by using the similarity of the patients phenotype to known phenotype-genotype data in human and model organism databases to further rank potential candidate genes. In a retrospective study, we apply PVP to the interpretation of whole exome sequencing data in patients suffering from congenital hypothyroidism, and find that PVP accurately identifies causative variants in whole exome and whole genome sequencing datasets and provides a powerful resource for the discovery of causal variants.

Collapse

Affiliation(s)

Imane Boudellioua King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Rozaimi B. Mahamad Razali King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Maxat Kulmanov King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Yasmeen Hashish King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Vladimir B. Bajic King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia
Eva Goncalves-Serra Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, United Kingdom
Nadia Schoenmakers University of Cambridge Metabolic Research Laboratories, Wellcome Trust—Medical Research Council, Institute of Metabolic Science, Addenbrooke’s Hospital, Cambridge, United Kingdom
Georgios V. Gkoutos College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, United Kingdom Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, Birmingham, United Kingdom Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, United Kingdom * E-mail: (GVG); (PNS); (RH)
Paul N. Schofield Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, United Kingdom * E-mail: (GVG); (PNS); (RH)
Robert Hoehndorf King Abdullah University of Science and Technology, Computer, Electrical & Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, Thuwal, Saudi Arabia * E-mail: (GVG); (PNS); (RH)

Collapse

Singhal A, Simmons M, Lu Z. Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine. PLoS Comput Biol 2016;12:e1005017. [PMID: 27902695 PMCID: PMC5130168 DOI: 10.1371/journal.pcbi.1005017] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2016] [Accepted: 06/04/2016] [Indexed: 11/23/2022] Open

Abstract

The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient’s genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer’s disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F₁-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease-gene-variant) that overlapped with entries in UniProt and 5,384 triplets without overlap in UniProt. Analysis of the overlapping triplets and of a stratified sample of the non-overlapping triplets revealed accuracies of 93% and 80% for the respective categories (cumulative accuracy, 77%). We conclude that our process represents an important and broadly applicable improvement to the state of the art for curation of disease-gene-variant relationships.

To provide personalized health care it is important to understand patients’ genomic variations and the effect these variants have in protecting or predisposing patients to disease. Several projects aim at providing this information by manually curating such genotype-phenotype relationships in organized databases using data from clinical trials and biomedical literature. However, the exponentially increasing size of biomedical literature and the limited ability of manual curators to discover the genotype-phenotype relationships “hidden” in text has led to delays in keeping such databases updated with the current findings. The result is a bottleneck in leveraging valuable information that is currently available to develop personalized health care solutions. In the past, a few computational techniques have attempted to speed up the curation efforts by using text mining techniques to automatically mine genotype-phenotype information from biomedical literature. However, such computational approaches have not been able to achieve accuracy levels sufficient to make them appealing for practical use. In this work, we present a highly accurate machine-learning-based text mining approach for mining complete genotype-phenotype relationships from biomedical literature. We test the performance of this approach on ten well-known diseases and demonstrate the validity of our approach and its potential utility for practical purposes. We are currently working towards generating genotype-phenotype relationships for all PubMed data with the goal of developing an exhaustive database of all the known diseases in life science. We believe that this work will provide very important and needed support for implementation of personalized health care using genomic data.

Collapse

Associations of Genetic Variants at Nongenic Susceptibility Loci with Breast Cancer Risk and Heterogeneity by Tumor Subtype in Southern Han Chinese Women. BIOMED RESEARCH INTERNATIONAL 2016;2016:3065493. [PMID: 27022606 PMCID: PMC4789034 DOI: 10.1155/2016/3065493] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/17/2015] [Revised: 01/06/2016] [Accepted: 02/04/2016] [Indexed: 12/05/2022]

Hamed AA, Ayer AA, Clark EM, Irons EA, Taylor GT, Zia A. Measuring climate change on Twitter using Google’s algorithm: perception and events. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS 2015. [DOI: 10.1108/ijwis-08-2015-0025] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Abstract Purpose – The purpose of this paper is to test the hypothesis of whether more complex and emergent hashtags can be sufficient pointers to climate change events. Human-induced climate change is one of this century’s greatest unbalancing forces to have affected our planet. Capturing the public awareness of climate change on Twitter has proven to be significant. In a previous research, it was demonstrated by the authors that public awareness is prominently expressed in the form of hashtags that uses more than one bigram (i.e. a climate change term). The research finding showed that this awareness is expressed by more complex terms (e.g. “climate change”). It was learned that the awareness was dominantly expressed using the hashtag: #ClimateChange. Design/methodology/approach – The methods demonstrated here use objective computational approaches [i.e. Google’s ranking algorithm and Information Retrieval measures (e.g. TFIDF)] to detect and rank the emerging events. Findings – The results shows a clear significant evidence for the events signaled using emergent hashtags and how globally influential they are. The research detected the Earth Day, 2015, which was signaled using the hashtag #EarthDay. Clearly, this is a day that is globally observed by the worldwide population. Originality/value – It was proven that these computational methods eliminate the subjectivity errors associated with humans and provide inexpensive solution for event detection on Twitter. Indeed, the approach used here can also be applicable to other types of event detections, beyond climate change, and surely applicable to other social media platforms that support the use of hashtags (e.g. Facebook). The paper explains, in great detail, the methods and all the numerous events detected. Collapse

Shameer K, Tripathi LP, Kalari KR, Dudley JT, Sowdhamini R. Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment. Brief Bioinform 2015;17:841-62. [PMID: 26494363 DOI: 10.1093/bib/bbv084] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Indexed: 12/20/2022] Open

Abstract

Accurate assessment of genetic variation in human DNA sequencing studies remains a nontrivial challenge in clinical genomics and genome informatics. Ascribing functional roles and/or clinical significances to single nucleotide variants identified from a next-generation sequencing study is an important step in genome interpretation. Experimental characterization of all the observed functional variants is yet impractical; thus, the prediction of functional and/or regulatory impacts of the various mutations using in silico approaches is an important step toward the identification of functionally significant or clinically actionable variants. The relationships between genotypes and the expressed phenotypes are multilayered and biologically complex; such relationships present numerous challenges and at the same time offer various opportunities for the design of in silico variant assessment strategies. Over the past decade, many bioinformatics algorithms have been developed to predict functional consequences of single nucleotide variants in the protein coding regions. In this review, we provide an overview of the bioinformatics resources for the prediction, annotation and visualization of coding single nucleotide variants. We discuss the currently available approaches and major challenges from the perspective of protein sequence, structure, function and interactions that require consideration when interpreting the impact of putatively functional variants. We also discuss the relevance of incorporating integrated workflows for predicting the biomedical impact of the functionally important variations encoded in a genome, exome or transcriptome. Finally, we propose a framework to classify variant assessment approaches and strategies for incorporation of variant assessment within electronic health records.

Collapse