1
|
Shu J, Li Y, Wang S, Xi B, Ma J. Disease gene prediction with privileged information and heteroscedastic dropout. Bioinformatics 2021; 37:i410-i417. [PMID: 34252957 PMCID: PMC8275341 DOI: 10.1093/bioinformatics/btab310] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/24/2021] [Indexed: 11/19/2022] Open
Abstract
Motivation Recently, machine learning models have achieved tremendous success in prioritizing candidate genes for genetic diseases. These models are able to accurately quantify the similarity among disease and genes based on the intuition that similar genes are more likely to be associated with similar diseases. However, the genetic features these methods rely on are often hard to collect due to high experimental cost and various other technical limitations. Existing solutions of this problem significantly increase the risk of overfitting and decrease the generalizability of the models. Results In this work, we propose a graph neural network (GNN) version of the Learning under Privileged Information paradigm to predict new disease gene associations. Unlike previous gene prioritization approaches, our model does not require the genetic features to be the same at training and test stages. If a genetic feature is hard to measure and therefore missing at the test stage, our model could still efficiently incorporate its information during the training process. To implement this, we develop a Heteroscedastic Gaussian Dropout algorithm, where the dropout probability of the GNN model is determined by another GNN model with a mirrored GNN architecture. To evaluate our method, we compared our method with four state-of-the-art methods on the Online Mendelian Inheritance in Man dataset to prioritize candidate disease genes. Extensive evaluations show that our model could improve the prediction accuracy when all the features are available compared to other methods. More importantly, our model could make very accurate predictions when >90% of the features are missing at the test stage. Availability and implementation Our method is realized with Python 3.7 and Pytorch 1.5.0 and method and data are freely available at: https://github.com/juanshu30/Disease-Gene-Prioritization-with-Privileged-Information-and-Heteroscedastic-Dropout.
Collapse
Affiliation(s)
- Juan Shu
- Department of Statistics, Purdue University, West Lafayette, IN 47906, USA
| | - Yu Li
- Department of Computer Science and Engineering, The Chinese University of HongKong, HongKong 999077, China
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA
| | - Bowei Xi
- Department of Statistics, Purdue University, West Lafayette, IN 47906, USA
| | - Jianzhu Ma
- Institute for Artificial Intelligence, Peking University, Beijing 100871, China
| |
Collapse
|
2
|
Zakeri P, Simm J, Arany A, ElShal S, Moreau Y. Gene prioritization using Bayesian matrix factorization with genomic and phenotypic side information. Bioinformatics 2019; 34:i447-i456. [PMID: 29949967 PMCID: PMC6022676 DOI: 10.1093/bioinformatics/bty289] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Motivation Most gene prioritization methods model each disease or phenotype individually, but this fails to capture patterns common to several diseases or phenotypes. To overcome this limitation, we formulate the gene prioritization task as the factorization of a sparsely filled gene-phenotype matrix, where the objective is to predict the unknown matrix entries. To deliver more accurate gene-phenotype matrix completion, we extend classical Bayesian matrix factorization to work with multiple side information sources. The availability of side information allows us to make non-trivial predictions for genes for which no previous disease association is known. Results Our gene prioritization method can innovatively not only integrate data sources describing genes, but also data sources describing Human Phenotype Ontology terms. Experimental results on our benchmarks show that our proposed model can effectively improve accuracy over the well-established gene prioritization method, Endeavour. In particular, our proposed method offers promising results on diseases of the nervous system; diseases of the eye and adnexa; endocrine, nutritional and metabolic diseases; and congenital malformations, deformations and chromosomal abnormalities, when compared to Endeavour. Availability and implementation The Bayesian data fusion method is implemented as a Python/C++ package: https://github.com/jaak-s/macau. It is also available as a Julia package: https://github.com/jaak-s/BayesianDataFusion.jl. All data and benchmarks generated or analyzed during this study can be downloaded at https://owncloud.esat.kuleuven.be/index.php/s/UGb89WfkZwMYoTn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pooya Zakeri
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven and imec, Kapeldreef Leuven, Belgium
| | - Jaak Simm
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven and imec, Kapeldreef Leuven, Belgium
| | - Adam Arany
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven and imec, Kapeldreef Leuven, Belgium
| | - Sarah ElShal
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven and imec, Kapeldreef Leuven, Belgium
| | - Yves Moreau
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven and imec, Kapeldreef Leuven, Belgium
| |
Collapse
|
3
|
Godard P, Page M. PCAN: phenotype consensus analysis to support disease-gene association. BMC Bioinformatics 2016; 17:518. [PMID: 27923364 PMCID: PMC5142268 DOI: 10.1186/s12859-016-1401-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 12/01/2016] [Indexed: 11/12/2022] Open
Abstract
Background Bridging genotype and phenotype is a fundamental biomedical challenge that underlies more effective target discovery and patient-tailored therapy. Approaches that can flexibly and intuitively, integrate known gene-phenotype associations in the context of molecular signaling networks are vital to effectively prioritize and biologically interpret genes underlying disease traits of interest. Results We describe Phenotype Consensus Analysis (PCAN); a method to assess the consensus semantic similarity of phenotypes in a candidate gene’s signaling neighborhood. We demonstrate that significant phenotype consensus (p < 0.05) is observable for ~67% of 4,549 OMIM disease-gene associations, using a combination of high quality String interactions + Metabase pathways and use Joubert Syndrome to demonstrate the ease with which a significant result can be interrogated to highlight discriminatory traits linked to mechanistically related genes. Conclusions We advocate phenotype consensus as an intuitive and versatile method to aid disease-gene association, which naturally lends itself to the mechanistic deconvolution of diverse phenotypes. We provide PCAN to the community as an R package (http://bioconductor.org/packages/PCAN/) to allow flexible configuration, extension and standalone use or integration to supplement existing gene prioritization workflows. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1401-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Patrice Godard
- Clarivate Analytics (formerly the IP & Science business of Thomson Reuters), 5901 Priestly Dr., #200, Carlsbad, CA, 92008, USA
| | - Matthew Page
- Translational Bioinformatics, UCB Pharma, 208 Bath Road, Slough, SL1 3WE, UK.
| |
Collapse
|
4
|
Li J, Lin X, Teng Y, Qi S, Xiao D, Zhang J, Kang Y. A Comprehensive Evaluation of Disease Phenotype Networks for Gene Prioritization. PLoS One 2016; 11:e0159457. [PMID: 27415759 PMCID: PMC4944959 DOI: 10.1371/journal.pone.0159457] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Accepted: 07/01/2016] [Indexed: 12/31/2022] Open
Abstract
Identification of disease-causing genes is a fundamental challenge for human health studies. The phenotypic similarity among diseases may reflect the interactions at the molecular level, and phenotype comparison can be used to predict disease candidate genes. Online Mendelian Inheritance in Man (OMIM) is a database of human genetic diseases and related genes that has become an authoritative source of disease phenotypes. However, disease phenotypes have been described by free text; thus, standardization of phenotypic descriptions is needed before diseases can be compared. Several disease phenotype networks have been established in OMIM using different standardization methods. Two of these networks are important for phenotypic similarity analysis: the first and most commonly used network (mimMiner) is standardized by medical subject heading, and the other network (resnikHPO) is the first to be standardized by human phenotype ontology. This paper comprehensively evaluates for the first time the accuracy of these two networks in gene prioritization based on protein–protein interactions using large-scale, leave-one-out cross-validation experiments. The results show that both networks can effectively prioritize disease-causing genes, and the approach that relates two diseases using a logistic function improves prioritization performance. Tanimoto, one of four methods for normalizing resnikHPO, generates a symmetric network and it performs similarly to mimMiner. Furthermore, an integration of these two networks outperforms either network alone in gene prioritization, indicating that these two disease networks are complementary.
Collapse
Affiliation(s)
- Jianhua Li
- Department of Biomedical Informatics, Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang, Liaoning, China
- Key Laboratory of Medical Image Computing of Northeastern University, Ministry of Education, Shenyang, Liaoning, China
| | - Xiaoyan Lin
- Department of Biomedical Informatics, Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang, Liaoning, China
| | - Yueyang Teng
- Department of Biomedical Imaging, Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang, Liaoning, China
| | - Shouliang Qi
- Key Laboratory of Medical Image Computing of Northeastern University, Ministry of Education, Shenyang, Liaoning, China
- Department of Biomedical Imaging, Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang, Liaoning, China
| | - Dayu Xiao
- Department of Biomedical Imaging, Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang, Liaoning, China
| | - Jianying Zhang
- Department of Biomedical Informatics, Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang, Liaoning, China
- Border Biomedical Research Center, Department of Biological Sciences, The University of Texas at El Paso, El Paso, Texas, United States of America
| | - Yan Kang
- Key Laboratory of Medical Image Computing of Northeastern University, Ministry of Education, Shenyang, Liaoning, China
- Department of Biomedical Imaging, Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang, Liaoning, China
- * E-mail:
| |
Collapse
|
5
|
Li W, Chen L, He W, Li W, Qu X, Liang B, Gao Q, Feng C, Jia X, Lv Y, Zhang S, Li X. Prioritizing disease candidate proteins in cardiomyopathy-specific protein-protein interaction networks based on "guilt by association" analysis. PLoS One 2013; 8:e71191. [PMID: 23940716 PMCID: PMC3733802 DOI: 10.1371/journal.pone.0071191] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 06/28/2013] [Indexed: 01/12/2023] Open
Abstract
The cardiomyopathies are a group of heart muscle diseases which can be inherited (familial). Identifying potential disease-related proteins is important to understand mechanisms of cardiomyopathies. Experimental identification of cardiomyophthies is costly and labour-intensive. In contrast, bioinformatics approach has a competitive advantage over experimental method. Based on “guilt by association” analysis, we prioritized candidate proteins involving in human cardiomyopathies. We first built weighted human cardiomyopathy-specific protein-protein interaction networks for three subtypes of cardiomyopathies using the known disease proteins from Online Mendelian Inheritance in Man as seeds. We then developed a method in prioritizing disease candidate proteins to rank candidate proteins in the network based on “guilt by association” analysis. It was found that most candidate proteins with high scores shared disease-related pathways with disease seed proteins. These top ranked candidate proteins were related with the corresponding disease subtypes, and were potential disease-related proteins. Cross-validation and comparison with other methods indicated that our approach could be used for the identification of potentially novel disease proteins, which may provide insights into cardiomyopathy-related mechanisms in a more comprehensive and integrated way.
Collapse
Affiliation(s)
- Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LC); (XL)
| | - Weiming He
- Institute of Opto-electronics, Harbin Institute of Technology, Harbin, Heilongjiang Province, China
| | - Weiguo Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xiaoli Qu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Binhua Liang
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Qianping Gao
- Department of Cardiology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Chenchen Feng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xu Jia
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Yana Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Siya Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LC); (XL)
| |
Collapse
|
6
|
Portales-Casamar E, Ch'ng C, Lui F, St-Georges N, Zoubarev A, Lai AY, Lee M, Kwok C, Kwok W, Tseng L, Pavlidis P. Neurocarta: aggregating and sharing disease-gene relations for the neurosciences. BMC Genomics 2013; 14:129. [PMID: 23442263 PMCID: PMC3599981 DOI: 10.1186/1471-2164-14-129] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Accepted: 02/23/2013] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Understanding the genetic basis of diseases is key to the development of better diagnoses and treatments. Unfortunately, only a small fraction of the existing data linking genes to phenotypes is available through online public resources and, when available, it is scattered across multiple access tools. DESCRIPTION Neurocarta is a knowledgebase that consolidates information on genes and phenotypes across multiple resources and allows tracking and exploring of the associations. The system enables automatic and manual curation of evidence supporting each association, as well as user-enabled entry of their own annotations. Phenotypes are recorded using controlled vocabularies such as the Disease Ontology to facilitate computational inference and linking to external data sources. The gene-to-phenotype associations are filtered by stringent criteria to focus on the annotations most likely to be relevant. Neurocarta is constantly growing and currently holds more than 30,000 lines of evidence linking over 7,000 genes to 2,000 different phenotypes. CONCLUSIONS Neurocarta is a one-stop shop for researchers looking for candidate genes for any disorder of interest. In Neurocarta, they can review the evidence linking genes to phenotypes and filter out the evidence they're not interested in. In addition, researchers can enter their own annotations from their experiments and analyze them in the context of existing public annotations. Neurocarta's in-depth annotation of neurodevelopmental disorders makes it a unique resource for neuroscientists working on brain development.
Collapse
Affiliation(s)
- Elodie Portales-Casamar
- Centre for High-Throughput Biology and Department of Psychiatry, University of British Columbia, 2125 East Mall, Vancouver, BC V6T1Z4, Canada
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Romi H, Cohen I, Landau D, Alkrinawi S, Yerushalmi B, Hershkovitz R, Newman-Heiman N, Cutting G, Ofir R, Sivan S, Birk O. Meconium ileus caused by mutations in GUCY2C, encoding the CFTR-activating guanylate cyclase 2C. Am J Hum Genet 2012; 90:893-9. [PMID: 22521417 DOI: 10.1016/j.ajhg.2012.03.022] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2012] [Revised: 02/29/2012] [Accepted: 03/28/2012] [Indexed: 12/19/2022] Open
Abstract
Meconium ileus, intestinal obstruction in the newborn, is caused in most cases by CFTR mutations modulated by yet-unidentified modifier genes. We now show that in two unrelated consanguineous Bedouin kindreds, an autosomal-recessive phenotype of meconium ileus that is not associated with cystic fibrosis (CF) is caused by different homozygous mutations in GUCY2C, leading to a dramatic reduction or fully abrogating the enzymatic activity of the encoded guanlyl cyclase 2C. GUCY2C is a transmembrane receptor whose extracellular domain is activated by either the endogenous ligands, guanylin and related peptide uroguanylin, or by an external ligand, Escherichia coli (E. coli) heat-stable enterotoxin STa. GUCY2C is expressed in the human intestine, and the encoded protein activates the CFTR protein through local generation of cGMP. Thus, GUCY2C is a likely candidate modifier of the meconium ileus phenotype in CF. Because GUCY2C heterozygous and homozygous mutant mice are resistant to E. coli STa enterotoxin-induced diarrhea, it is plausible that GUCY2C mutations in the desert-dwelling Bedouin kindred are of selective advantage.
Collapse
|
8
|
Piro RM, Di Cunto F. Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J 2012; 279:678-96. [PMID: 22221742 DOI: 10.1111/j.1742-4658.2012.08471.x] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The identification of genes involved in human hereditary diseases often requires the time-consuming and expensive examination of a great number of possible candidate genes, since genome-wide techniques such as linkage analysis and association studies frequently select many hundreds of 'positional' candidates. Even considering the positive impact of next-generation sequencing technologies, the prioritization of candidate genes may be an important step for disease-gene identification. In this paper we develop a basic classification scheme for computational approaches to disease-gene prediction and apply it to exhaustively review bioinformatics tools that have been developed for this purpose, focusing on conceptual aspects rather than technical detail and performance. Finally, we discuss some past successes obtained by computational approaches to illustrate their beneficial contribution to medical research.
Collapse
Affiliation(s)
- Rosario M Piro
- Department of Theoretical Bioinformatics, German Cancer Research Center, (DKFZ), Heidelberg, Germany.
| | | |
Collapse
|
9
|
Abstract
In gene prediction, studying phenotypes is highly valuable for reducing the number of locus candidates in association studies and to aid disease gene candidate prioritization. This is due to the intrinsic nature of phenotypes to visibly reflect genetic activity, making them potentially one of the most useful data types for functional studies. However, systematic use of these data has begun only recently. 'Comparative phenomics' is the analysis of genotype-phenotype associations across species and experimental methods. This is an emerging research field of utmost importance for gene discovery and gene function annotation. In this chapter, we review the use of phenotype data in the biomedical field. We will give an overview of phenotype resources, focusing on PhenomicDB--a cross-species genotype-phenotype database--which is the largest available collection of phenotype descriptions across species and experimental methods. We report on its latest extension by which genotype-phenotype relationships can be viewed as graphical representations of similar phenotypes clustered together ('phenoclusters'), supplemented with information from protein-protein interactions and Gene Ontology terms. We show that such 'phenoclusters' represent a novel approach to group genes functionally and to predict novel gene functions with high precision. We explain how these data and methods can be used to supplement the results of gene discovery approaches. The aim of this chapter is to assist researchers interested in understanding how phenotype data can be used effectively in the gene discovery field.
Collapse
|
10
|
Mordechai S, Gradstein L, Pasanen A, Ofir R, El Amour K, Levy J, Belfair N, Lifshitz T, Joshua S, Narkis G, Elbedour K, Myllyharju J, Birk OS. High myopia caused by a mutation in LEPREL1, encoding prolyl 3-hydroxylase 2. Am J Hum Genet 2011; 89:438-45. [PMID: 21885030 DOI: 10.1016/j.ajhg.2011.08.003] [Citation(s) in RCA: 74] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2011] [Revised: 07/30/2011] [Accepted: 08/01/2011] [Indexed: 01/29/2023] Open
Abstract
Autosomal-recessive high-grade axial myopia was diagnosed in Bedouin Israeli consanguineous kindred. Some affected individuals also had variable expressivity of early-onset cataracts, peripheral vitreo-retinal degeneration, and secondary sight loss due to severe retinal detachments. Through genome-wide linkage analysis, the disease-associated gene was mapped to ∼1.7 Mb on chromosome 3q28 (the maximum LOD score was 11.5 at θ = 0 for marker D3S1314). Sequencing of the entire coding regions and intron-exon boundaries of the six genes within the defined locus identified a single mutation (c.1523G>T) in exon 10 of LEPREL1, encoding prolyl 3-hydroxylase 2 (P3H2), a 2-oxoglutarate-dependent dioxygenase that hydroxylates collagens. The mutation affects a glycine that is conserved within P3H isozymes. Analysis of wild-type and p.Gly508Val (c.1523G>T) mutant recombinant P3H2 polypeptides expressed in insect cells showed that the mutation led to complete inactivation of P3H2.
Collapse
Affiliation(s)
- Shikma Mordechai
- The Morris Kahn Laboratory of Human Genetics, National Institute for Biotechnology in the Negev, Ben Gurion University of the Negev, Beer-Sheva Israel
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
The desmosterolosis phenotype: spasticity, microcephaly and micrognathia with agenesis of corpus callosum and loss of white matter. Eur J Hum Genet 2011; 19:942-6. [PMID: 21559050 DOI: 10.1038/ejhg.2011.74] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
Desmosterolosis is a rare autosomal recessive disorder of elevated levels of the cholesterol precursor desmosterol in plasma, tissue and cultured cells. With only two sporadic cases described to date with two very different phenotypes, the clinical entity arising from mutations in 24-dehydrocholesterol reductase (DHCR24) has yet to be defined. We now describe consanguineous Bedouin kindred with four surviving affected individuals, all presenting with severe failure to thrive, psychomotor retardation, microcephaly, micrognathia and spasticity with variable degree of hand contractures. Convulsions near birth, nystagmus and strabismus were found in most. Brain MRI demonstrated significant reduction in white matter and near agenesis of corpus callosum in all. Genome-wide linkage analysis and fine mapping defined a 6.75 cM disease-associated locus in chromosome 1 (maximum multipoint LOD score of six), and sequencing of candidate genes within this locus identified in the affected individuals a homozygous missense mutation in DHCR24 leading to dramatically augmented plasma desmosterol levels. We thus establish a clear consistent phenotype of desmosterolosis (MIM 602398).
Collapse
|
12
|
Cohen R, Gefen A, Elhadad M, Birk OS. CSI-OMIM--Clinical Synopsis Search in OMIM. BMC Bioinformatics 2011; 12:65. [PMID: 21362185 PMCID: PMC3053257 DOI: 10.1186/1471-2105-12-65] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2010] [Accepted: 03/01/2011] [Indexed: 11/18/2022] Open
Abstract
Background The OMIM database is a tool used daily by geneticists. Syndrome pages include a Clinical Synopsis section containing a list of known phenotypes comprising a clinical syndrome. The phenotypes are in free text and different phrases are often used to describe the same phenotype, the differences originating in spelling variations or typing errors, varying sentence structures and terminological variants. These variations hinder searching for syndromes or using the large amount of phenotypic information for research purposes. In addition, negation forms also create false positives when searching the textual description of phenotypes and induce noise in text mining applications. Description Our method allows efficient and complete search of OMIM phenotypes as well as improved data-mining of the OMIM phenome. Applying natural language processing, each phrase is tagged with additional semantic information using UMLS and MESH. Using a grammar based method, annotated phrases are clustered into groups denoting similar phenotypes. These groups of synonymous expressions enable precise search, as query terms can be matched with the many variations that appear in OMIM, while avoiding over-matching expressions that include the query term in a negative context. On the basis of these clusters, we computed pair-wise similarity among syndromes in OMIM. Using this new similarity measure, we identified 79,770 new connections between syndromes, an average of 16 new connections per syndrome. Our project is Web-based and available at http://fohs.bgu.ac.il/s2g/csiomim Conclusions The resulting enhanced search functionality provides clinicians with an efficient tool for diagnosis. This search application is also used for finding similar syndromes for the candidate gene prioritization tool S2G. The enhanced OMIM database we produced can be further used for bioinformatics purposes such as linking phenotypes and genes based on syndrome similarities and the known genes in Morbidmap.
Collapse
Affiliation(s)
- Raphael Cohen
- The Morris Kahn Laboratory of Human Genetics, National Institute for Biotechnology in the Negev, Ben-Gurion University, Beer-Sheva, Israel.
| | | | | | | |
Collapse
|
13
|
Abstract
Despite increasing sequencing capacity, genetic disease investigation still frequently results in the identification of loci containing multiple candidate disease genes that need to be tested for involvement in the disease. This process can be expedited by prioritizing the candidates prior to testing. Over the last decade, a large number of computational methods and tools have been developed to assist the clinical geneticist in prioritizing candidate disease genes. In this chapter, we give an overview of computational tools that can be used for this purpose, all of which are freely available over the web.
Collapse
Affiliation(s)
- Martin Oti
- Structural and Computational Biology Division, Victor Chang Cardiac Research Institute, 2010, Darlinghurst, NSW, Australia.
| | | | | |
Collapse
|
14
|
Feinstein M, Markus B, Noyman I, Shalev H, Flusser H, Shelef I, Liani-Leibson K, Shorer Z, Cohen I, Khateeb S, Sivan S, Birk OS. Pelizaeus-Merzbacher-like disease caused by AIMP1/p43 homozygous mutation. Am J Hum Genet 2010; 87:820-8. [PMID: 21092922 DOI: 10.1016/j.ajhg.2010.10.016] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2010] [Revised: 10/10/2010] [Accepted: 10/14/2010] [Indexed: 01/23/2023] Open
Abstract
Pelizaeus-Merzbacher disease is an X-linked hypomyelinating leukodystrophy caused by PLP1 mutations. A similar autosomal-recessive phenotype, Pelizaeus-Merzbacher-like disease (PMLD), has been shown to be caused by homozygous mutations in GJC2 or HSPD1. We report a consanguineous Israeli Bedouin kindred with clinical and radiological findings compatible with PMLD in which linkage to PLP1, GJC2, and HSPD1 was excluded. Through genome-wide homozygosity mapping and mutation analysis, we demonstrated in all affected individuals a homozygous frameshift mutation that fully abrogates the main active domain of AIMP1, encoding ARS-interacting multifunctional protein 1. The mutation fully segregates with the disease-associated phenotype and was not found in 250 Bedouin controls. Our findings are in line with the previously demonstrated inability of mutant mice lacking the AIMP1/p43 ortholog to maintain axon integrity in the central and peripheral neural system.
Collapse
|
15
|
Feldshtein M, Elkrinawi S, Yerushalmi B, Marcus B, Vullo D, Romi H, Ofir R, Landau D, Sivan S, Supuran CT, Birk OS. Hyperchlorhidrosis caused by homozygous mutation in CA12, encoding carbonic anhydrase XII. Am J Hum Genet 2010; 87:713-20. [PMID: 21035102 DOI: 10.1016/j.ajhg.2010.10.008] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2010] [Revised: 10/08/2010] [Accepted: 10/12/2010] [Indexed: 12/22/2022] Open
Abstract
Excessive chloride secretion in sweat (hyperchlorhidrosis), leading to a positive sweat test, is most commonly indicative of cystic fibrosis yet is found also in conjunction with various metabolic, endocrine, and dermatological disorders. There is conflicting evidence regarding the existence of autosomal-recessive hyperchlorhidrosis. We now describe a consanguineous Israeli Bedouin kindred with autosomal-recessive hyperchlohidrosis whose sole symptoms are visible salt precipitates after sweating, a preponderance to hyponatremic dehydration, and poor feeding and slow weight gain at infancy. Through genome-wide linkage analysis, we demonstrate that the phenotype is due to a homozygous mutation in CA12, encoding carbonic anhydrase XII. The mutant (c.427G>A [p.Glu143Lys]) protein showed 71% activity of the wild-type enzyme for catalyzing the CO₂ hydration to bicarbonate and H(+), and it bound the clinically used sulfonamide inhibitor acetazolamide with high affinity (K(I) of 10 nM). Unlike the wild-type enzyme, which is not inhibited by chloride, bromide, or iodide (K(I)s of 73-215 mM), the mutant is inhibited in the submicromolar range by these anions (K(I)s of 0.37-0.73 mM).
Collapse
|
16
|
Agamy O, Ben Zeev B, Lev D, Marcus B, Fine D, Su D, Narkis G, Ofir R, Hoffmann C, Leshinsky-Silver E, Flusser H, Sivan S, Söll D, Lerman-Sagie T, Birk OS. Mutations disrupting selenocysteine formation cause progressive cerebello-cerebral atrophy. Am J Hum Genet 2010; 87:538-44. [PMID: 20920667 DOI: 10.1016/j.ajhg.2010.09.007] [Citation(s) in RCA: 104] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2010] [Revised: 08/22/2010] [Accepted: 09/08/2010] [Indexed: 10/19/2022] Open
Abstract
The essential micronutrient selenium is found in proteins as selenocysteine (Sec), the only genetically encoded amino acid whose biosynthesis occurs on its cognate tRNA in humans. In the final step of selenocysteine formation, the essential enzyme SepSecS catalyzes the conversion of Sep-tRNA to Sec-tRNA. We demonstrate that SepSecS mutations cause autosomal-recessive progressive cerebellocerebral atrophy (PCCA) in Jews of Iraqi and Moroccan ancestry. Both founder mutations, common in these two populations, disrupt the sole route to the biosynthesis of the 21st amino acid, Sec, and thus to the generation of selenoproteins in humans.
Collapse
|