1
|
Jeon H, Lee JL, Shim H, Joe S, Byeon I, Kim CW, Lim SB, Park IJ, Yoon YS, Chu HBK, Kim YJ, Yu CS, Yang JO. Genetic variations and recurrence in stage III Korean colorectal cancer: Insights from tumor-only mutation analysis. PLoS One 2025; 20:e0323302. [PMID: 40408428 PMCID: PMC12101642 DOI: 10.1371/journal.pone.0323302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Accepted: 04/06/2025] [Indexed: 05/25/2025] Open
Abstract
Colorectal cancer (CRC) has the second highest incidence rate among all cancers in Korea, with approximately 30% of patients with regional CRC experiencing recurrence. Understanding the genetic drivers of recurrence is essential for early detection and targeted treatment. Therefore, many studies have focused on genetic analysis using tumor-normal matched samples, as this approach provides more comprehensive insights. However, tumor-only samples are far more common in clinical practice because of the difficulty in obtaining normal tissues, making developing robust methods for analyzing tumor-only data a pressing need. This study aimed to investigate the genetic variations associated with CRC recurrence using tumor-only whole-exome sequencing data from 200 Korean patients with stage III CRC. By applying stringent filtering using public databases including Genome Aggregation Database (gnomAD), Exome Aggregation Consortium (ExAC), Single Nucleotide Polymorphism Database (dbSNP), 1000 Genomes Project (1000G), Korean Variant Archive 2 (KOVA2), and Korean Reference Genome Database (KRGDB), we identified 221 statistically significant mutations across 195 genes with distinct distributions between the recurrence and non-recurrence groups. Furthermore, statistical analysis of the clinical data revealed that the T-category, N-category, and preoperative carcinoembryonic antigen levels were correlated with CRC recurrence. Moreover, we identified nine networks through protein-protein interaction analysis and identified networks with high feature importance. We also developed a CRC recurrence prediction model using PyCaret, which achieved an area under the curve (AUC) of 0.77. Our findings highlight the importance of robust variant filtering in tumor-only sample analyses and provide insights into the genetic landscape of CRC recurrence in the Korean population.
Collapse
Affiliation(s)
- Hajin Jeon
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience & Biotechnology (KRIBB), Daejeon, Republic of Korea
| | - Jong Lyul Lee
- Department of Surgery, Division of Colon and Rectal Surgery, University of Ulsan College of Medicine and Asan Medical Center, Seoul, Republic of Korea
| | - Hyeran Shim
- Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Soobok Joe
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience & Biotechnology (KRIBB), Daejeon, Republic of Korea
| | - Iksu Byeon
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience & Biotechnology (KRIBB), Daejeon, Republic of Korea
| | - Chan Wook Kim
- Department of Surgery, Division of Colon and Rectal Surgery, University of Ulsan College of Medicine and Asan Medical Center, Seoul, Republic of Korea
| | - Seok-Byung Lim
- Department of Surgery, Division of Colon and Rectal Surgery, University of Ulsan College of Medicine and Asan Medical Center, Seoul, Republic of Korea
| | - In Ja Park
- Department of Surgery, Division of Colon and Rectal Surgery, University of Ulsan College of Medicine and Asan Medical Center, Seoul, Republic of Korea
| | - Yong Sik Yoon
- Department of Surgery, Division of Colon and Rectal Surgery, University of Ulsan College of Medicine and Asan Medical Center, Seoul, Republic of Korea
| | - Hoang Bao Khanh Chu
- Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Young-Joon Kim
- Department of Biochemistry, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Chang Sik Yu
- Department of Surgery, Division of Colon and Rectal Surgery, University of Ulsan College of Medicine and Asan Medical Center, Seoul, Republic of Korea
| | - Jin Ok Yang
- Korea Bioinformation Center (KOBIC), Korea Research Institute of Bioscience & Biotechnology (KRIBB), Daejeon, Republic of Korea
| |
Collapse
|
2
|
Ahmad RM, Ali BR, Al-Jasmi F, Al Dhaheri N, Al Turki S, Kizhakkedath P, Mohamad MS. AI-derived comparative assessment of the performance of pathogenicity prediction tools on missense variants of breast cancer genes. Hum Genomics 2024; 18:99. [PMID: 39256852 PMCID: PMC11389290 DOI: 10.1186/s40246-024-00667-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2024] [Accepted: 08/22/2024] [Indexed: 09/12/2024] Open
Abstract
Single nucleotide variants (SNVs) can exert substantial and extremely variable impacts on various cellular functions, making accurate predictions of their consequences challenging, albeit crucial especially in clinical settings such as in oncology. Laboratory-based experimental methods for assessing these effects are time-consuming and often impractical, highlighting the importance of in-silico tools for variant impact prediction. However, the performance metrics of currently available tools on breast cancer missense variants from benchmarking databases have not been thoroughly investigated, creating a knowledge gap in the accurate prediction of pathogenicity. In this study, the benchmarking datasets ClinVar and HGMD were used to evaluate 21 Artificial Intelligence (AI)-derived in-silico tools. Missense variants in breast cancer genes were extracted from ClinVar and HGMD professional v2023.1. The HGMD dataset focused on pathogenic variants only, to ensure balance, benign variants for the same genes were included from the ClinVar database. Interestingly, our analysis of both datasets revealed variants across genes with varying penetrance levels like low and moderate in addition to high, reinforcing the value of disease-specific tools. The top-performing tools on ClinVar dataset identified were MutPred (Accuracy = 0.73), Meta-RNN (Accuracy = 0.72), ClinPred (Accuracy = 0.71), Meta-SVM, REVEL, and Fathmm-XF (Accuracy = 0.70). While on HGMD dataset they were ClinPred (Accuracy = 0.72), MetaRNN (Accuracy = 0.71), CADD (Accuracy = 0.69), Fathmm-MKL (Accuracy = 0.68), and Fathmm-XF (Accuracy = 0.67). These findings offer clinicians and researchers valuable insights for selecting, improving, and developing effective in-silico tools for breast cancer pathogenicity prediction. Bridging this knowledge gap contributes to advancing precision medicine and enhancing diagnostic and therapeutic approaches for breast cancer patients with potential implications for other conditions.
Collapse
Affiliation(s)
- Rahaf M Ahmad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Bassam R Ali
- Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Fatma Al-Jasmi
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
- Division of Metabolic Genetics, Department of Pediatrics, Tawam Hospital, Al Ain, United Arab Emirates
| | - Noura Al Dhaheri
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
- Division of Metabolic Genetics, Department of Pediatrics, Tawam Hospital, Al Ain, United Arab Emirates
| | - Saeed Al Turki
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Praseetha Kizhakkedath
- Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates
| | - Mohd Saberi Mohamad
- Health Data Science Lab, Department of Genetics and Genomics, College of Medical and Health Sciences, United Arab Emirates University, Tawam road, Al Maqam district, Al Ain, Abu Dhabi, United Arab Emirates.
- Center for Engineering Computational Intelligence, Faculty of Engineering and Technology, Multimedia University, Melaka, Malaysia.
| |
Collapse
|
3
|
Joshi B, de Lannoy C, Howarth MR, Kim SH, Joo C. iMAX FRET (Information Maximized FRET) for Multipoint Single-Molecule Structural Analysis. NANO LETTERS 2024; 24:8487-8494. [PMID: 38975639 PMCID: PMC11261617 DOI: 10.1021/acs.nanolett.4c00447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 07/02/2024] [Accepted: 07/02/2024] [Indexed: 07/09/2024]
Abstract
Understanding the structure of biomolecules is vital for deciphering their roles in biological systems. Single-molecule techniques have emerged as alternatives to conventional ensemble structure analysis methods for uncovering new biology in molecular dynamics and interaction studies, yet only limited structural information could be obtained experimentally. Here, we address this challenge by introducing iMAX FRET, a one-pot method that allows ab initio 3D profiling of individual molecules using two-color FRET measurements. Through the stochastic exchange of fluorescent weak binders, iMAX FRET simultaneously assesses multiple distances on a biomolecule within a few minutes, which can then be used to reconstruct the coordinates of up to four points in each molecule, allowing structure-based inference. We demonstrate the 3D reconstruction of DNA nanostructures, protein quaternary structures, and conformational changes in proteins. With iMAX FRET, we provide a powerful approach to advance the understanding of biomolecular structure by expanding conventional FRET analysis to three dimensions.
Collapse
Affiliation(s)
- Bhagyashree
S. Joshi
- Kavli
Institute of Nanoscience, Department of Bionanoscience, Delft University of Technology, Delft 2629HZ, The Netherlands
| | - Carlos de Lannoy
- Kavli
Institute of Nanoscience, Department of Bionanoscience, Delft University of Technology, Delft 2629HZ, The Netherlands
| | - Mark R. Howarth
- Department
of Biochemistry, University of Oxford, South Parks Road, Oxford OX1 3QU, U.K.
| | - Sung Hyun Kim
- Kavli
Institute of Nanoscience, Department of Bionanoscience, Delft University of Technology, Delft 2629HZ, The Netherlands
- Department
of Physics, Ewha Womans University, Seoul 03760, Republic of Korea
- New
and Renewable Energy Research Center, Ewha
Womans University, Seoul 03760, Republic
of Korea
| | - Chirlmin Joo
- Kavli
Institute of Nanoscience, Department of Bionanoscience, Delft University of Technology, Delft 2629HZ, The Netherlands
- Department
of Physics, Ewha Womans University, Seoul 03760, Republic of Korea
| |
Collapse
|
4
|
Vihinen M. Individual Genetic Heterogeneity. Genes (Basel) 2022; 13:1626. [PMID: 36140794 PMCID: PMC9498725 DOI: 10.3390/genes13091626] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2022] [Revised: 08/25/2022] [Accepted: 09/08/2022] [Indexed: 11/28/2022] Open
Abstract
Genetic variation has been widely covered in literature, however, not from the perspective of an individual in any species. Here, a synthesis of genetic concepts and variations relevant for individual genetic constitution is provided. All the different levels of genetic information and variation are covered, ranging from whether an organism is unmixed or hybrid, has variations in genome, chromosomes, and more locally in DNA regions, to epigenetic variants or alterations in selfish genetic elements. Genetic constitution and heterogeneity of microbiota are highly relevant for health and wellbeing of an individual. Mutation rates vary widely for variation types, e.g., due to the sequence context. Genetic information guides numerous aspects in organisms. Types of inheritance, whether Mendelian or non-Mendelian, zygosity, sexual reproduction, and sex determination are covered. Functions of DNA and functional effects of variations are introduced, along with mechanism that reduce and modulate functional effects, including TARAR countermeasures and intraindividual genetic conflict. TARAR countermeasures for tolerance, avoidance, repair, attenuation, and resistance are essential for life, integrity of genetic information, and gene expression. The genetic composition, effects of variations, and their expression are considered also in diseases and personalized medicine. The text synthesizes knowledge and insight on individual genetic heterogeneity and organizes and systematizes the central concepts.
Collapse
Affiliation(s)
- Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22184 Lund, Sweden
| |
Collapse
|
5
|
Yazar M, Ozbek P. Assessment of 13 in silico pathogenicity methods on cancer-related variants. Comput Biol Med 2022; 145:105434. [DOI: 10.1016/j.compbiomed.2022.105434] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Revised: 03/04/2022] [Accepted: 03/20/2022] [Indexed: 11/03/2022]
|
6
|
The structure-based cancer-related single amino acid variation prediction. Sci Rep 2021; 11:13599. [PMID: 34193921 PMCID: PMC8245468 DOI: 10.1038/s41598-021-92793-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2021] [Accepted: 06/16/2021] [Indexed: 11/09/2022] Open
Abstract
Single amino acid variation (SAV) is an amino acid substitution of the protein sequence that can potentially influence the entire protein structure or function, as well as its binding affinity. Protein destabilization is related to diseases, including several cancers, although using traditional experiments to clarify the relationship between SAVs and cancer uses much time and resources. Some SAV prediction methods use computational approaches, with most predicting SAV-induced changes in protein stability. In this investigation, all SAV characteristics generated from protein sequences, structures and the microenvironment were converted into feature vectors and fed into an integrated predicting system using a support vector machine and genetic algorithm. Critical features were used to estimate the relationship between their properties and cancers caused by SAVs. We describe how we developed a prediction system based on protein sequences and structure that is capable of distinguishing if the SAV is related to cancer or not. The five-fold cross-validation performance of our system is 89.73% for the accuracy, 0.74 for the Matthews correlation coefficient, and 0.81 for the F1 score. We have built an online prediction server, CanSavPre ( http://bioinfo.cmu.edu.tw/CanSavPre/ ), which is expected to become a useful, practical tool for cancer research and precision medicine.
Collapse
|
7
|
Sarkar A, Yang Y, Vihinen M. Variation benchmark datasets: update, criteria, quality and applications. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2020:5710862. [PMID: 32016318 PMCID: PMC6997940 DOI: 10.1093/database/baz117] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 06/03/2019] [Accepted: 07/01/2019] [Indexed: 02/07/2023]
Abstract
Development of new computational methods and testing their performance has to be carried out using experimental data. Only in comparison to existing knowledge can method performance be assessed. For that purpose, benchmark datasets with known and verified outcome are needed. High-quality benchmark datasets are valuable and may be difficult, laborious and time consuming to generate. VariBench and VariSNP are the two existing databases for sharing variation benchmark datasets used mainly for variation interpretation. They have been used for training and benchmarking predictors for various types of variations and their effects. VariBench was updated with 419 new datasets from 109 papers containing altogether 329 014 152 variants; however, there is plenty of redundancy between the datasets. VariBench is freely available at http://structure.bmc.lu.se/VariBench/. The contents of the datasets vary depending on information in the original source. The available datasets have been categorized into 20 groups and subgroups. There are datasets for insertions and deletions, substitutions in coding and non-coding region, structure mapped, synonymous and benign variants. Effect-specific datasets include DNA regulatory elements, RNA splicing, and protein property for aggregation, binding free energy, disorder and stability. Then there are several datasets for molecule-specific and disease-specific applications, as well as one dataset for variation phenotype effects. Variants are often described at three molecular levels (DNA, RNA and protein) and sometimes also at the protein structural level including relevant cross references and variant descriptions. The updated VariBench facilitates development and testing of new methods and comparison of obtained performances to previously published methods. We compared the performance of the pathogenicity/tolerance predictor PON-P2 to several benchmark studies, and show that such comparisons are feasible and useful, however, there may be limitations due to lack of provided details and shared data. Database URL: http://structure.bmc.lu.se/VariBench
Collapse
Affiliation(s)
- Anasua Sarkar
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden
| | - Yang Yang
- School of Computer Science and Technology, Soochow University, No1. Shizi Street, Suzhou, 215006 Jiangsu, China.,Provincial Key Laboratory for Computer Information Processing Technology, No1. Shizi Street, Soochow University, Suzhou, 215006 Jiangsu, China
| | - Mauno Vihinen
- Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden
| |
Collapse
|
8
|
Ashford P, Pang CSM, Moya-García AA, Adeyelu T, Orengo CA. A CATH domain functional family based approach to identify putative cancer driver genes and driver mutations. Sci Rep 2019; 9:263. [PMID: 30670742 PMCID: PMC6343001 DOI: 10.1038/s41598-018-36401-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2018] [Accepted: 11/13/2018] [Indexed: 12/31/2022] Open
Abstract
Tumour sequencing identifies highly recurrent point mutations in cancer driver genes, but rare functional mutations are hard to distinguish from large numbers of passengers. We developed a novel computational platform applying a multi-modal approach to filter out passengers and more robustly identify putative driver genes. The primary filter identifies enrichment of cancer mutations in CATH functional families (CATH-FunFams) – structurally and functionally coherent sets of evolutionary related domains. Using structural representatives from CATH-FunFams, we subsequently seek enrichment of mutations in 3D and show that these mutation clusters have a very significant tendency to lie close to known functional sites or conserved sites predicted using CATH-FunFams. Our third filter identifies enrichment of putative driver genes in functionally coherent protein network modules confirmed by literature analysis to be cancer associated. Our approach is complementary to other domain enrichment approaches exploiting Pfam families, but benefits from more functionally coherent groupings of domains. Using a set of mutations from 22 cancers we detect 151 putative cancer drivers, of which 79 are not listed in cancer resources and include recently validated cancer associated genes EPHA7, DCC netrin-1 receptor and zinc-finger protein ZNF479.
Collapse
Affiliation(s)
- Paul Ashford
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Camilla S M Pang
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Aurelio A Moya-García
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.,Laboratorio de Biología Molecular del Cáncer, Centro de Investigaciones Médico-Sanitarias (CIMES), Universidad de Málaga, Málaga, Spain
| | - Tolulope Adeyelu
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK
| | - Christine A Orengo
- Institute of Structural and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.
| |
Collapse
|
9
|
Schaafsma GCP, Vihinen M. Representativeness of variation benchmark datasets. BMC Bioinformatics 2018; 19:461. [PMID: 30497376 PMCID: PMC6267811 DOI: 10.1186/s12859-018-2478-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2018] [Accepted: 11/09/2018] [Indexed: 12/14/2022] Open
Abstract
Background Benchmark datasets are essential for both method development and performance assessment. These datasets have numerous requirements, representativeness being one. In the case of variant tolerance/pathogenicity prediction, representativeness means that the dataset covers the space of variations and their effects. Results We performed the first analysis of the representativeness of variation benchmark datasets. We used statistical approaches to investigate how proteins in the benchmark datasets were representative for the entire human protein universe. We investigated the distributions of variants in chromosomes, protein structures, CATH domains and classes, Pfam protein families, Enzyme Commission (EC) classifications and Gene Ontology annotations in 24 datasets that have been used for training and testing variant tolerance prediction methods. All the datasets were available in VariBench or VariSNP databases. We tested also whether the pathogenic variant datasets contained neutral variants defined as those that have high minor allele frequency in the ExAC database. The distributions of variants over the chromosomes and proteins varied greatly between the datasets. Conclusions None of the datasets was found to be well representative. Many of the tested datasets had quite good coverage of the different protein characteristics. Dataset size correlates to representativeness but only weakly to the performance of methods trained on them. The results imply that dataset representativeness is an important factor and should be taken into account in predictor development and testing. Electronic supplementary material The online version of this article (10.1186/s12859-018-2478-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Gerard C P Schaafsma
- Protein Structure and Bioinformatics, Department of Experimental Medical Science, Lund University, BMC B13, SE-221 84, Lund, Sweden
| | - Mauno Vihinen
- Protein Structure and Bioinformatics, Department of Experimental Medical Science, Lund University, BMC B13, SE-221 84, Lund, Sweden.
| |
Collapse
|
10
|
Abstract
Somatic variations are frequent and important drivers in cancers. Amino acid substitutions can yield neoantigens that are detected by the immune system. Neoantigens can lead to immune response and tumor rejection. Although neoantigen load and occurrence have been widely studied, a detailed pan-cancer analysis of the occurrence and characterization of neoepitopes is missing. We investigated the proteome-wide amino acid substitutions in 8-, 9-, 10-, and 11-mer peptides in 30 cancer types with the NetMHC 4.0 software. 11,316,078 (0.24%) of the predicted 8-, 9-, 10-, and 11-mer peptides were highly likely neoepitope candidates and were derived from 95.44% of human proteins. Binding affinity to MHC molecules is just one of the many epitope features. The most likely epitopes are those which are detected by several MHCs and of several peptide lengths. 9-mer peptides are the most common among the high binding neoantigens. 0.17% of all variants yield more than 100 neoepitopes and are considered as the best candidates for any application. Amino acid distributions indicate that variants at all positions in neoepitopes of any length are, on average, more hydrophobic than the wild-type residues. We characterized properties of neoepitopes in 30 cancer types and estimated the likely numbers of tumor-derived epitopes that could induce an immune response. We found that amino acid distributions, at all positions in neoepitopes of all lengths, contain more hydrophobic residues than the wild-type sequences implying that the hydropathy nature of neoepitopes is an important property. The neoepitope characteristics can be employed for various applications including targeted cancer vaccine development for precision medicine.
Collapse
|
11
|
Schaafsma GCP, Vihinen M. Large differences in proportions of harmful and benign amino acid substitutions between proteins and diseases. Hum Mutat 2017; 38:839-848. [DOI: 10.1002/humu.23236] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Revised: 04/05/2017] [Accepted: 04/20/2017] [Indexed: 12/21/2022]
Affiliation(s)
- Gerard C. P. Schaafsma
- Protein Structure and Bioinformatics; Department of Experimental Medical Science; Lund University; Lund Sweden
| | | |
Collapse
|
12
|
Niroula A, Vihinen M. PON-P and PON-P2 predictor performance in CAGI challenges: Lessons learned. Hum Mutat 2017; 38:1085-1091. [PMID: 28224672 DOI: 10.1002/humu.23199] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2016] [Revised: 01/25/2017] [Accepted: 02/17/2017] [Indexed: 01/14/2023]
Abstract
Computational tools are widely used for ranking and prioritizing variants for characterizing their disease relevance. Since numerous tools have been developed, they have to be properly assessed before being applied. Critical Assessment of Genome Interpretation (CAGI) experiments have significantly contributed toward the assessment of prediction methods for various tasks. Within and outside the CAGI, we have addressed several questions that facilitate development and assessment of variation interpretation tools. These areas include collection and distribution of benchmark datasets, their use for systematic large-scale method assessment, and the development of guidelines for reporting methods and their performance. For us, CAGI has provided a chance to experiment with new ideas, test the application areas of our methods, and network with other prediction method developers. In this article, we discuss our experiences and lessons learned from the various CAGI challenges. We describe our approaches, their performance, and impact of CAGI on our research. Finally, we discuss some of the possibilities that CAGI experiments have opened up and make some suggestions for future experiments.
Collapse
Affiliation(s)
- Abhishek Niroula
- Protein Structure and Bioinformatics Group, Department of Experimental Medical Science, Lund University, Lund, Sweden
| | - Mauno Vihinen
- Protein Structure and Bioinformatics Group, Department of Experimental Medical Science, Lund University, Lund, Sweden
| |
Collapse
|
13
|
Jang K, Kim K, Cho A, Lee I, Choi JK. Network perturbation by recurrent regulatory variants in cancer. PLoS Comput Biol 2017; 13:e1005449. [PMID: 28333928 PMCID: PMC5383347 DOI: 10.1371/journal.pcbi.1005449] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Revised: 04/06/2017] [Accepted: 03/10/2017] [Indexed: 12/12/2022] Open
Abstract
Cancer driving genes have been identified as recurrently affected by variants that alter protein-coding sequences. However, a majority of cancer variants arise in noncoding regions, and some of them are thought to play a critical role through transcriptional perturbation. Here we identified putative transcriptional driver genes based on combinatorial variant recurrence in cis-regulatory regions. The identified genes showed high connectivity in the cancer type-specific transcription regulatory network, with high outdegree and many downstream genes, highlighting their causative role during tumorigenesis. In the protein interactome, the identified transcriptional drivers were not as highly connected as coding driver genes but appeared to form a network module centered on the coding drivers. The coding and regulatory variants associated via these interactions between the coding and transcriptional drivers showed exclusive and complementary occurrence patterns across tumor samples. Transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes. Identifying driver variants is a current challenge facing cancer genomics. A well-established and robust method for this is to find recurrence in large cohorts of samples. Recurrence patterns of amino acid-changing variants can reveal oncogenes and tumor suppressor genes. However, such single-gene approaches have limitations because of rare variants. Therefore, recurrently affected protein complexes, network modules, or signaling pathways have been identified based on network-level recurrence. Here we dissect chromatin interactome to identify cis-regulatory variants that show high gene-level recurrence. We then employ the gene regulatory network and protein interactome to characterize putative cancer genes with cis-regulatory variant recurrence. These genes were located at critical positions in the regulatory network. By contrast, they are at the circumference in the protein interactome; instead, they form a network module with coding cancer genes located at hub positions. Furthermore, the coding and regulatory variants associated via these interactions showed exclusive and complementary occurrence patterns across tumor samples. Therefore, we suggest that transcriptional cancer drivers may act through an extensive perturbation of the regulatory network and by altering protein network modules through interactions with coding driver genes.
Collapse
Affiliation(s)
- Kiwon Jang
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Kwoneel Kim
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
| | - Ara Cho
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul, Republic of Korea
| | - Jung Kyoon Choi
- Department of Bio and Brain Engineering, KAIST, Daejeon, Republic of Korea
- * E-mail:
| |
Collapse
|
14
|
Vihinen M. Establishment of an international database for genetic variants in esophageal cancer. Ann N Y Acad Sci 2016; 1381:45-49. [PMID: 27442983 DOI: 10.1111/nyas.13152] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2016] [Revised: 05/20/2016] [Accepted: 05/25/2016] [Indexed: 11/29/2022]
Abstract
The establishment of a database has been suggested in order to collect, organize, and distribute genetic information about esophageal cancer. The World Organization for Specialized Studies on Diseases of the Esophagus and the Human Variome Project will be in charge of a central database of information about esophageal cancer-related variations from publications, databases, and laboratories; in addition to genetic details, clinical parameters will also be included. The aim will be to get all the central players in research, clinical, and commercial laboratories to contribute. The database will follow established recommendations and guidelines. The database will require a team of dedicated curators with different backgrounds. Numerous layers of systematics will be applied to facilitate computational analyses. The data items will be extensively integrated with other information sources. The database will be distributed as open access to ensure exchange of the data with other databases. Variations will be reported in relation to reference sequences on three levels--DNA, RNA, and protein-whenever applicable. In the first phase, the database will concentrate on genetic variations including both somatic and germline variations for susceptibility genes. Additional types of information can be integrated at a later stage.
Collapse
Affiliation(s)
- Mauno Vihinen
- Protein Structure and Bioinformatics Group, Department of Experimental Medical Science, Lund University, Lund, Sweden.
| |
Collapse
|
15
|
Niroula A, Vihinen M. Variation Interpretation Predictors: Principles, Types, Performance, and Choice. Hum Mutat 2016; 37:579-97. [DOI: 10.1002/humu.22987] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 03/07/2016] [Indexed: 12/18/2022]
Affiliation(s)
- Abhishek Niroula
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| | - Mauno Vihinen
- Department of Experimental Medical Science; Lund University; BMC B13 Lund SE-22184 Sweden
| |
Collapse
|