1
|
van Ravesteyn TW, Dekker M, Riele HT. Mono- and Biallelic Replication-Coupled Gene Editing Discriminates Dominant-Negative and Loss-of-Function Variants of DNA Mismatch Repair Genes. J Mol Diagn 2024; 26:805-814. [PMID: 38925454 DOI: 10.1016/j.jmoldx.2024.05.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 05/08/2024] [Accepted: 05/23/2024] [Indexed: 06/28/2024] Open
Abstract
Replication-coupled gene editing using locked nucleic acid-modified single-stranded DNA oligonucleotides (LMOs) can genetically engineer mammalian cells with high precision at single nucleotide resolution. Based on this method, oligonucleotide-directed mutation screening (ODMS) was developed to determine whether variants of uncertain clinical significance of DNA mismatch repair (MMR) genes can cause Lynch syndrome. In ODMS, the appearance of 6-thioguanine-resistant colonies upon introduction of the variant is indicative for defective MMR and hence pathogenicity. Whereas mouse embryonic stem cells (mESCs) hemizygous for MMR genes were used previously, we now show that ODMS can also be applied in wild-type mESCs carrying two functional alleles of each MMR gene. 6-Thioguanine resistance can result from two possible events: first, the mutation is present in only one allele, which is indicative for dominant-negative activity of the variant; and second, both alleles contain the planned modification, which is indicative for a regular loss-of-function variant. Thus, ODMS in wild-type mESCs can discriminate fully disruptive and dominant-negative MMR variants. The feasibility of biallelic targeting suggests that the efficiency of LMO-mediated gene targeting at a nonselectable locus may be enriched in cells that had undergone a simultaneous selectable LMO targeting event. This turned out to be the case and provided a protocol to improve recovery of LMO-mediated gene modification events.
Collapse
Affiliation(s)
- Thomas W van Ravesteyn
- Division of Tumor Biology and Immunology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Marleen Dekker
- Division of Tumor Biology and Immunology, Netherlands Cancer Institute, Amsterdam, the Netherlands
| | - Hein Te Riele
- Division of Tumor Biology and Immunology, Netherlands Cancer Institute, Amsterdam, the Netherlands.
| |
Collapse
|
2
|
Abildgaard AB, Nielsen SV, Bernstein I, Stein A, Lindorff-Larsen K, Hartmann-Petersen R. Lynch syndrome, molecular mechanisms and variant classification. Br J Cancer 2023; 128:726-734. [PMID: 36434153 PMCID: PMC9978028 DOI: 10.1038/s41416-022-02059-z] [Citation(s) in RCA: 24] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Revised: 10/31/2022] [Accepted: 11/02/2022] [Indexed: 11/27/2022] Open
Abstract
Patients with the heritable cancer disease, Lynch syndrome, carry germline variants in the MLH1, MSH2, MSH6 and PMS2 genes, encoding the central components of the DNA mismatch repair system. Loss-of-function variants disrupt the DNA mismatch repair system and give rise to a detrimental increase in the cellular mutational burden and cancer development. The treatment prospects for Lynch syndrome rely heavily on early diagnosis; however, accurate diagnosis is inextricably linked to correct clinical interpretation of individual variants. Protein variant classification traditionally relies on cumulative information from occurrence in patients, as well as experimental testing of the individual variants. The complexity of variant classification is due to (1) that variants of unknown significance are rare in the population and phenotypic information on the specific variants is missing, and (2) that individual variant testing is challenging, costly and slow. Here, we summarise recent developments in high-throughput technologies and computational prediction tools for the assessment of variants of unknown significance in Lynch syndrome. These approaches may vastly increase the number of interpretable variants and could also provide important mechanistic insights into the disease. These insights may in turn pave the road towards developing personalised treatment approaches for Lynch syndrome.
Collapse
Affiliation(s)
- Amanda B Abildgaard
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Sofie V Nielsen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Inge Bernstein
- Department of Surgical Gastroenterology, Aalborg University Hospital, Aalborg, Denmark
- Institute of Clinical Medicine, Aalborg University Hospital, Aalborg University, Aalborg, Denmark
| | - Amelie Stein
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Rasmus Hartmann-Petersen
- The Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
3
|
Eboreime J, Choi SK, Yoon SR, Sadybekov A, Katritch V, Calabrese P, Arnheim N. Germline selection of PTPN11 (HGNC:9644) variants make a major contribution to both Noonan syndrome's high birth rate and the transmission of sporadic cancer variants resulting in fetal abnormality. Hum Mutat 2022; 43:2205-2221. [PMID: 36349709 PMCID: PMC10099774 DOI: 10.1002/humu.24493] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 09/20/2022] [Accepted: 10/12/2022] [Indexed: 11/10/2022]
Abstract
Some spontaneous germline gain-of-function mutations promote spermatogonial stem cell clonal expansion and disproportionate variant sperm production leading to unexpectedly high transmission rates for some human genetic conditions. To measure the frequency and spatial distribution of de novo mutations we divided three testes into 192 pieces each and used error-corrected deep-sequencing on each piece. We focused on PTPN11 (HGNC:9644) Exon 3 that contains 30 different PTPN11 Noonan syndrome (NS) mutation sites. We found 14 of these variants formed clusters among the testes; one testis had 11 different variant clusters. The mutation frequencies of these different clusters were not correlated with their case-recurrence rates nor were case recurrence rates of PTPN11 variants correlated with their tyrosine phosphatase levels thereby confusing PTPN11's role in germline clonal expansion. Six of the PTPN11 exon 3 de novo variants associated with somatic mutation-induced sporadic cancers (but not NS) also formed testis clusters. Further, three of these six variants were observed among fetuses that underwent prenatal ultrasound screening for NS-like features. Mathematical modeling showed that germline selection can explain both the mutation clusters and the high incidence of NS (1/1000-1/2500).
Collapse
Affiliation(s)
- Jordan Eboreime
- Department of Biological Sciences, Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
| | - Soo-Kyung Choi
- Department of Biological Sciences, Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
| | - Song-Ro Yoon
- Department of Biological Sciences, Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
| | - Anastasiia Sadybekov
- Department of Chemistry, Bridge Institute, University of Southern California, Los Angeles, California, USA
| | - Vsevolod Katritch
- Department of Chemistry, Bridge Institute, University of Southern California, Los Angeles, California, USA
| | - Peter Calabrese
- Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, California, USA
| | - Norman Arnheim
- Department of Biological Sciences, Molecular and Computational Biology Program, University of Southern California, Los Angeles, California, USA
| |
Collapse
|
4
|
Liu Y, Yeung WSB, Chiu PCN, Cao D. Computational approaches for predicting variant impact: An overview from resources, principles to applications. Front Genet 2022; 13:981005. [PMID: 36246661 PMCID: PMC9559863 DOI: 10.3389/fgene.2022.981005] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Accepted: 08/08/2022] [Indexed: 11/13/2022] Open
Abstract
One objective of human genetics is to unveil the variants that contribute to human diseases. With the rapid development and wide use of next-generation sequencing (NGS), massive genomic sequence data have been created, making personal genetic information available. Conventional experimental evidence is critical in establishing the relationship between sequence variants and phenotype but with low efficiency. Due to the lack of comprehensive databases and resources which present clinical and experimental evidence on genotype-phenotype relationship, as well as accumulating variants found from NGS, different computational tools that can predict the impact of the variants on phenotype have been greatly developed to bridge the gap. In this review, we present a brief introduction and discussion about the computational approaches for variant impact prediction. Following an innovative manner, we mainly focus on approaches for non-synonymous variants (nsSNVs) impact prediction and categorize them into six classes. Their underlying rationale and constraints, together with the concerns and remedies raised from comparative studies are discussed. We also present how the predictive approaches employed in different research. Although diverse constraints exist, the computational predictive approaches are indispensable in exploring genotype-phenotype relationship.
Collapse
Affiliation(s)
- Ye Liu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| | - William S. B. Yeung
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Philip C. N. Chiu
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
- Department of Obstetrics and Gynaecology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Dandan Cao
- Shenzhen Key Laboratory of Fertility Regulation, Reproductive Medicine Center, The University of Hong Kong-Shenzhen Hospital, Shenzhen, China
| |
Collapse
|
5
|
Yang Y, Zeng L, Vihinen M. PON-Sol2: Prediction of Effects of Variants on Protein Solubility. Int J Mol Sci 2021; 22:8027. [PMID: 34360790 PMCID: PMC8348231 DOI: 10.3390/ijms22158027] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 07/19/2021] [Accepted: 07/22/2021] [Indexed: 01/13/2023] Open
Abstract
Genetic variations have a multitude of effects on proteins. A substantial number of variations affect protein-solvent interactions, either aggregation or solubility. Aggregation is often related to structural alterations, whereas solubilizable proteins in the solid phase can be made again soluble by dilution. Solubility is a central protein property and when reduced can lead to diseases. We developed a prediction method, PON-Sol2, to identify amino acid substitutions that increase, decrease, or have no effect on the protein solubility. The method is a machine learning tool utilizing gradient boosting algorithm and was trained on a large dataset of variants with different outcomes after the selection of features among a large number of tested properties. The method is fast and has high performance. The normalized correct prediction rate for three states is 0.656, and the normalized GC2 score is 0.312 in 10-fold cross-validation. The corresponding numbers in the blind test were 0.545 and 0.157. The performance was superior in comparison to previous methods. The PON-Sol2 predictor is freely available. It can be used to predict the solubility effects of variants for any organism, even in large-scale projects.
Collapse
Affiliation(s)
- Yang Yang
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China; (Y.Y.); (L.Z.)
| | - Lianjie Zeng
- School of Computer Science and Technology, Soochow University, Suzhou 215006, China; (Y.Y.); (L.Z.)
- Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210000, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, BMC B13, SE-221 84 Lund, Sweden
| |
Collapse
|
6
|
Sarkar A, Yang Y, Vihinen M. Variation benchmark datasets: update, criteria, quality and applications. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5710862. [PMID: 32016318 PMCID: PMC6997940 DOI: 10.1093/database/baz117] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 06/03/2019] [Accepted: 07/01/2019] [Indexed: 02/07/2023]
Abstract
Development of new computational methods and testing their performance has to be carried out using experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets used mainly for variation interpretation. They have been used for training and benchmarking predictors for various types of variations and their effects. VariBench was updated with 419 new datasets from 109 papers containing altogether 329 014 152 variants; however, there is plenty of redundancy between the datasets. VariBench is freely available at http://structure.bmc.lu.se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performances to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and show that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data. Database URL: http://structure.bmc.lu.se/VariBench
Collapse
Affiliation(s)
- Anasua Sarkar
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden
| | - Yang Yang
- School of Computer Science and Technology, Soochow University, No1. Shizi Street, Suzhou, 215006 Jiangsu, China.,Provincial Key Laboratory for Computer Information Processing Technology, No1. Shizi Street, Soochow University, Suzhou, 215006 Jiangsu, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden
| |
Collapse
|
7
|
Schaafsma GCP, Vihinen M. Representativeness of variation benchmark datasets. BMC Bioinformatics 2018; 19:461. [PMID: 30497376 PMCID: PMC6267811 DOI: 10.1186/s12859-018-2478-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2018] [Accepted: 11/09/2018] [Indexed: 12/14/2022] Open
Abstract
Background Benchmark datasets are essential for both method development and performance assessment. These datasets have numerous requirements, representativeness being one. In the case of variant tolerance/pathogenicity prediction, representativeness means that the dataset covers the space of variations and their effects. Results We performed the first analysis of the representativeness of variation benchmark datasets. We used statistical approaches to investigate how proteins in the benchmark datasets were representative for the entire human protein universe. We investigated the distributions of variants in chromosomes, protein structures, CATH domains and classes, Pfam protein families, Enzyme Commission (EC) classifications and Gene Ontology annotations in 24 datasets that have been used for training and testing variant tolerance prediction methods. All the datasets were available in VariBench or VariSNP databases. We tested also whether the pathogenic variant datasets contained neutral variants defined as those that have high minor allele frequency in the ExAC database. The distributions of variants over the chromosomes and proteins varied greatly between the datasets. Conclusions None of the datasets was found to be well representative. Many of the tested datasets had quite good coverage of the different protein characteristics. Dataset size correlates to representativeness but only weakly to the performance of methods trained on them. The results imply that dataset representativeness is an important factor and should be taken into account in predictor development and testing. Electronic supplementary material The online version of this article (10.1186/s12859-018-2478-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gerard C P Schaafsma
- Protein Structure and Bioinformatics, Department of Experimental Medical Science, Lund University, BMC B13, SE-221 84, Lund, Sweden
| | - Mauno Vihinen
- Protein Structure and Bioinformatics, Department of Experimental Medical Science, Lund University, BMC B13, SE-221 84, Lund, Sweden.
| |
Collapse
|
8
|
PON-tstab: Protein Variant Stability Predictor. Importance of Training Data Quality. Int J Mol Sci 2018; 19:ijms19041009. [PMID: 29597263 PMCID: PMC5979465 DOI: 10.3390/ijms19041009] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 03/21/2018] [Accepted: 03/24/2018] [Indexed: 12/24/2022] Open
Abstract
Several methods have been developed to predict effects of amino acid substitutions on protein stability. Benchmark datasets are essential for method training and testing and have numerous requirements including that the data is representative for the investigated phenomenon. Available machine learning algorithms for variant stability have all been trained with ProTherm data. We noticed a number of issues with the contents, quality and relevance of the database. There were errors, but also features that had not been clearly communicated. Consequently, all machine learning variant stability predictors have been trained on biased and incorrect data. We obtained a corrected dataset and trained a random forests-based tool, PON-tstab, applicable to variants in any organism. Our results highlight the importance of the benchmark quality, suitability and appropriateness. Predictions are provided for three categories: stability decreasing, increasing and those not affecting stability.
Collapse
|
9
|
Čalyševa J, Vihinen M. PON-SC - program for identifying steric clashes caused by amino acid substitutions. BMC Bioinformatics 2017; 18:531. [PMID: 29187139 PMCID: PMC5707825 DOI: 10.1186/s12859-017-1947-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 11/21/2017] [Indexed: 11/10/2022] Open
Abstract
Background Amino acid substitutions due to DNA nucleotide replacements are frequently disease-causing because of affecting functionally important sites. If the substituting amino acid does not fit into the protein, it causes structural alterations that are often harmful. Clashes of amino acids cause local or global structural changes. Testing structural compatibility of variations has been difficult due to the lack of a dedicated method that could handle vast amounts of variation data produced by next generation sequencing technologies. Results We developed a method, PON-SC, for detecting protein structural clashes due to amino acid substitutions. The method utilizes side chain rotamer library and tests whether any of the common rotamers can be fitted into the protein structure. The tool was tested both with variants that cause and do not cause clashes and found to have accuracy of 0.71 over five test datasets. Conclusions We developed a fast method for residue side chain clash detection. The method provides in addition to the prediction also visualization of the variant in three dimensional structure. Electronic supplementary material The online version of this article (10.1186/s12859-017-1947-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jelena Čalyševa
- Protein Structure and Bioinformatics, Department of Experimental Medical Science, Lund University, BMC B13, SE-22 184, Lund, Sweden.,Present address: EMBL Heidelberg, Meyerhofstraße 1, 69117, Heidelberg, Germany
| | - Mauno Vihinen
- Protein Structure and Bioinformatics, Department of Experimental Medical Science, Lund University, BMC B13, SE-22 184, Lund, Sweden.
| |
Collapse
|
10
|
Schaafsma GCP, Vihinen M. Large differences in proportions of harmful and benign amino acid substitutions between proteins and diseases. Hum Mutat 2017; 38:839-848. [DOI: 10.1002/humu.23236] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 04/05/2017] [Accepted: 04/20/2017] [Indexed: 12/21/2022]
Affiliation(s)
- Gerard C. P. Schaafsma
- Protein Structure and Bioinformatics; Department of Experimental Medical Science; Lund University; Lund Sweden
| | | |
Collapse
|
11
|
Arora S, Huwe PJ, Sikder R, Shah M, Browne AJ, Lesh R, Nicolas E, Deshpande S, Hall MJ, Dunbrack RL, Golemis EA. Functional analysis of rare variants in mismatch repair proteins augments results from computation-based predictive methods. Cancer Biol Ther 2017; 18:519-533. [PMID: 28494185 DOI: 10.1080/15384047.2017.1326439] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The cancer-predisposing Lynch Syndrome (LS) arises from germline mutations in DNA mismatch repair (MMR) genes, predominantly MLH1, MSH2, MSH6, and PMS2. A major challenge for clinical diagnosis of LS is the frequent identification of variants of uncertain significance (VUS) in these genes, as it is often difficult to determine variant pathogenicity, particularly for missense variants. Generic programs such as SIFT and PolyPhen-2, and MMR gene-specific programs such as PON-MMR and MAPP-MMR, are often used to predict deleterious or neutral effects of VUS in MMR genes. We evaluated the performance of multiple predictive programs in the context of functional biologic data for 15 VUS in MLH1, MSH2, and PMS2. Using cell line models, we characterized VUS predicted to range from neutral to pathogenic on mRNA and protein expression, basal cellular viability, viability following treatment with a panel of DNA-damaging agents, and functionality in DNA damage response (DDR) signaling, benchmarking to wild-type MMR proteins. Our results suggest that the MMR gene-specific classifiers do not always align with the experimental phenotypes related to DDR. Our study highlights the importance of complementary experimental and computational assessment to develop future predictors for the assessment of VUS.
Collapse
Affiliation(s)
- Sanjeevani Arora
- a Molecular Therapeutics Program , Fox Chase Cancer Center , Philadelphia , PA , USA
| | - Peter J Huwe
- a Molecular Therapeutics Program , Fox Chase Cancer Center , Philadelphia , PA , USA
| | - Rahmat Sikder
- a Molecular Therapeutics Program , Fox Chase Cancer Center , Philadelphia , PA , USA
| | - Manali Shah
- a Molecular Therapeutics Program , Fox Chase Cancer Center , Philadelphia , PA , USA
| | - Amanda J Browne
- b Immersion Science Program , Fox Chase Cancer Center , Philadelphia , PA , USA
| | - Randy Lesh
- a Molecular Therapeutics Program , Fox Chase Cancer Center , Philadelphia , PA , USA
| | - Emmanuelle Nicolas
- a Molecular Therapeutics Program , Fox Chase Cancer Center , Philadelphia , PA , USA
| | - Sanat Deshpande
- b Immersion Science Program , Fox Chase Cancer Center , Philadelphia , PA , USA
| | - Michael J Hall
- c Department of Clinical Genetics , Fox Chase Cancer Center , Philadelphia , PA , USA
| | - Roland L Dunbrack
- a Molecular Therapeutics Program , Fox Chase Cancer Center , Philadelphia , PA , USA
| | - Erica A Golemis
- a Molecular Therapeutics Program , Fox Chase Cancer Center , Philadelphia , PA , USA
| |
Collapse
|
12
|
Niroula A, Vihinen M. PON-P and PON-P2 predictor performance in CAGI challenges: Lessons learned. Hum Mutat 2017; 38:1085-1091. [PMID: 28224672 DOI: 10.1002/humu.23199] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Revised: 01/25/2017] [Accepted: 02/17/2017] [Indexed: 01/14/2023]
Abstract
Computational tools are widely used for ranking and prioritizing variants for characterizing their disease relevance. Since numerous tools have been developed, they have to be properly assessed before being applied. Critical Assessment of Genome Interpretation (CAGI) experiments have significantly contributed toward the assessment of prediction methods for various tasks. Within and outside the CAGI, we have addressed several questions that facilitate development and assessment of variation interpretation tools. These areas include collection and distribution of benchmark datasets, their use for systematic large-scale method assessment, and the development of guidelines for reporting methods and their performance. For us, CAGI has provided a chance to experiment with new ideas, test the application areas of our methods, and network with other prediction method developers. In this article, we discuss our experiences and lessons learned from the various CAGI challenges. We describe our approaches, their performance, and impact of CAGI on our research. Finally, we discuss some of the possibilities that CAGI experiments have opened up and make some suggestions for future experiments.
Collapse
Affiliation(s)
- Abhishek Niroula
- Protein Structure and Bioinformatics Group, Department of Experimental Medical Science, Lund University, Lund, Sweden
| | - Mauno Vihinen
- Protein Structure and Bioinformatics Group, Department of Experimental Medical Science, Lund University, Lund, Sweden
| |
Collapse
|
13
|
Tricarico R, Kasela M, Mareni C, Thompson BA, Drouet A, Staderini L, Gorelli G, Crucianelli F, Ingrosso V, Kantelinen J, Papi L, De Angioletti M, Berardi M, Gaildrat P, Soukarieh O, Turchetti D, Martins A, Spurdle AB, Nyström M, Genuardi M. Assessment of the InSiGHT Interpretation Criteria for the Clinical Classification of 24 MLH1 and MSH2 Gene Variants. Hum Mutat 2016; 38:64-77. [PMID: 27629256 DOI: 10.1002/humu.23117] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2016] [Revised: 09/04/2016] [Accepted: 09/09/2016] [Indexed: 01/15/2023]
Abstract
Pathogenicity assessment of DNA variants in disease genes to explain their clinical consequences is an integral component of diagnostic molecular testing. The International Society for Gastrointestinal Hereditary Tumors (InSiGHT) has developed specific criteria for the interpretation of mismatch repair (MMR) gene variants. Here, we performed a systematic investigation of 24 MLH1 and MSH2 variants. The assessments were done by analyzing population frequency, segregation, tumor molecular characteristics, RNA effects, protein expression levels, and in vitro MMR activity. Classifications were confirmed for 15 variants and changed for three, and for the first time determined for six novel variants. Overall, based on our results, we propose the introduction of some refinements to the InSiGHT classification rules. The proposed changes have the advantage of homogenizing the InSIGHT interpretation criteria with those set out by the Evidence-based Network for the Interpretation of Germline Mutant Alleles (ENIGMA) consortium for the BRCA1/BRCA2 genes. We also observed that the addition of only few clinical data was sufficient to obtain a more stable classification for variants considered as "likely pathogenic" or "likely nonpathogenic." This shows the importance of obtaining as many as possible points of evidence for variant interpretation, especially from the clinical setting.
Collapse
Affiliation(s)
- Rossella Tricarico
- Department of Biomedical, Experimental and Clinical Sciences, Medical Genetics Unit, University of Florence, Florence, Italy.,Cancer Epigenetics and Cancer Biology Programs, Fox Chase Cancer Center, Philadelphia, Pennsylvania
| | - Mariann Kasela
- Department of Biosciences, Division of Genetics, University of Helsinki, Helsinki, Finland
| | | | - Bryony A Thompson
- Department of Oncological Sciences, Huntsman Cancer Institute, University of Utah, Salt Lake City, Utah.,Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Victoria, Australia
| | - Aurélie Drouet
- Inserm-U1079-IRIB, Normandy Centre for Genomic and Personalized Medicine, University of Rouen, Rouen, France
| | - Lucia Staderini
- Department of Biomedical, Experimental and Clinical Sciences, Medical Genetics Unit, University of Florence, Florence, Italy.,Microbiology and Virology Department, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Greta Gorelli
- Department of Biomedical, Experimental and Clinical Sciences, Medical Genetics Unit, University of Florence, Florence, Italy
| | - Francesca Crucianelli
- Department of Biomedical, Experimental and Clinical Sciences, Medical Genetics Unit, University of Florence, Florence, Italy
| | - Valentina Ingrosso
- Department of Biomedical, Experimental and Clinical Sciences, Medical Genetics Unit, University of Florence, Florence, Italy
| | - Jukka Kantelinen
- Department of Biosciences, Division of Genetics, University of Helsinki, Helsinki, Finland
| | - Laura Papi
- Department of Biomedical, Experimental and Clinical Sciences, Medical Genetics Unit, University of Florence, Florence, Italy
| | - Maria De Angioletti
- Cancer Genetics and Gene Transfer - Core Research Laboratory, Istituto Toscano Tumori, Florence, Italy.,ICCOM-CNR, Sesto Fiorentino, Italy
| | - Margherita Berardi
- Cancer Genetics and Gene Transfer - Core Research Laboratory, Istituto Toscano Tumori, Florence, Italy
| | - Pascaline Gaildrat
- Inserm-U1079-IRIB, Normandy Centre for Genomic and Personalized Medicine, University of Rouen, Rouen, France
| | - Omar Soukarieh
- Inserm-U1079-IRIB, Normandy Centre for Genomic and Personalized Medicine, University of Rouen, Rouen, France
| | - Daniela Turchetti
- Medical Genetics, Department of Medical and Surgical Sciences (DIMEC), University of Bologna, Bologna, Italy
| | - Alexandra Martins
- Inserm-U1079-IRIB, Normandy Centre for Genomic and Personalized Medicine, University of Rouen, Rouen, France
| | - Amanda B Spurdle
- Genetics and Computational Biology Department, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Minna Nyström
- Department of Biosciences, Division of Genetics, University of Helsinki, Helsinki, Finland
| | - Maurizio Genuardi
- Department of Biomedical, Experimental and Clinical Sciences, Medical Genetics Unit, University of Florence, Florence, Italy.,Institute of Genomic Medicine, A. Gemelli School of Medicine, Medical Genetics Unit, Catholic University of the Sacred Heart, Rome, Italy
| | | |
Collapse
|
14
|
Riera C, Padilla N, de la Cruz X. The Complementarity Between Protein-Specific and General Pathogenicity Predictors for Amino Acid Substitutions. Hum Mutat 2016; 37:1013-24. [DOI: 10.1002/humu.23048] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2016] [Revised: 06/30/2016] [Accepted: 07/06/2016] [Indexed: 11/06/2022]
Affiliation(s)
- Casandra Riera
- Research Unit in Translational Bioinformatics; Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona; Barcelona Spain
| | - Natàlia Padilla
- Research Unit in Translational Bioinformatics; Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona; Barcelona Spain
| | - Xavier de la Cruz
- Research Unit in Translational Bioinformatics; Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona; Barcelona Spain
- ICREA; Barcelona Spain
| |
Collapse
|
15
|
Niroula A, Vihinen M. Variation Interpretation Predictors: Principles, Types, Performance, and Choice. Hum Mutat 2016; 37:579-97. [DOI: 10.1002/humu.22987] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 03/07/2016] [Indexed: 12/18/2022]
Affiliation(s)
- Abhishek Niroula
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| | - Mauno Vihinen
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| |
Collapse
|
16
|
Niroula A, Vihinen M. PON-mt-tRNA: a multifactorial probability-based method for classification of mitochondrial tRNA variations. Nucleic Acids Res 2016; 44:2020-7. [PMID: 26843426 PMCID: PMC4797295 DOI: 10.1093/nar/gkw046] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2015] [Accepted: 01/14/2016] [Indexed: 12/19/2022] Open
Abstract
Transfer RNAs (tRNAs) are essential for encoding the transcribed genetic information from DNA into proteins. Variations in the human tRNAs are involved in diverse clinical phenotypes. Interestingly, all pathogenic variations in tRNAs are located in mitochondrial tRNAs (mt-tRNAs). Therefore, it is crucial to identify pathogenic variations in mt-tRNAs for disease diagnosis and proper treatment. We collected mt-tRNA variations using a classification based on evidence from several sources and used the data to develop a multifactorial probability-based prediction method, PON-mt-tRNA, for classification of mt-tRNA single nucleotide substitutions. We integrated a machine learning-based predictor and an evidence-based likelihood ratio for pathogenicity using evidence of segregation, biochemistry and histochemistry to predict the posterior probability of pathogenicity of variants. The accuracy and Matthews correlation coefficient (MCC) of PON-mt-tRNA are 1.00 and 0.99, respectively. In the absence of evidence from segregation, biochemistry and histochemistry, PON-mt-tRNA classifies variations based on the machine learning method with an accuracy and MCC of 0.69 and 0.39, respectively. We classified all possible single nucleotide substitutions in all human mt-tRNAs using PON-mt-tRNA. The variations in the loops are more often tolerated compared to the variations in stems. The anticodon loop contains comparatively more predicted pathogenic variations than the other loops. PON-mt-tRNA is available at http://structure.bmc.lu.se/PON-mt-tRNA/.
Collapse
Affiliation(s)
- Abhishek Niroula
- Department of Experimental Medical Science, Lund University, BMC B13, SE-22184 Lund, Sweden
| | - Mauno Vihinen
- Department of Experimental Medical Science, Lund University, BMC B13, SE-22184 Lund, Sweden
| |
Collapse
|
17
|
Peng Y, Alexov E. Investigating the linkage between disease-causing amino acid variants and their effect on protein stability and binding. Proteins 2016; 84:232-9. [PMID: 26650512 DOI: 10.1002/prot.24968] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2015] [Accepted: 11/30/2015] [Indexed: 12/12/2022]
Abstract
Single amino acid variations (SAV) occurring in human population result in natural differences between individuals or cause diseases. It is well understood that the molecular effect of SAV can be manifested as changes of the wild type characteristics of the corresponding protein, among which are the protein stability and protein interactions. Typically the effect of SAV on protein stability and interactions was assessed via the changes of the wild type folding and binding free energies. However, in terms of SAV affecting protein functionally and disease susceptibility, one wants to know to what extend the wild type function is perturbed by the SAV. Here it is demonstrated that relative, rather than the absolute, change of the folding and binding free energy serves as a good indicator for SAV association with disease. Using HumVar as a source for disease-causing SAV and experimentally determined free energy changes from ProTherm and SKEMPI databases, correlation coefficients (CC) between the disease index (Pd) and relative folding (Ppr,f) and binding (Ppr,b) probability indexes, respectively, was achieved. The obtained CCs demonstrated the applicability of the proposed approach and it served as good indicator for SAV association with disease.
Collapse
Affiliation(s)
- Yunhui Peng
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, South Carolina, 29634
| | - Emil Alexov
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, South Carolina, 29634
| |
Collapse
|