1
|
Giacoletto CJ, Valente LJ, Brown L, Patterson S, Gokhale R, Mockus SM, Grody WW, Deng HW, Rotter JI, Schiller MR. New Gain-of-Function Mutations Prioritize Mechanisms of HER2 Activation. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2025:2025.03.03.25323043. [PMID: 40093211 PMCID: PMC11908269 DOI: 10.1101/2025.03.03.25323043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/19/2025]
Abstract
ERBB2 (HER2) is a well-studied oncogene with several driver mutations apart from the well-known amplification defect in some breast cancers. We used the GigaAssay to test the functional effect of HER2 missense mutations on its receptor tyrosine kinase function. The GigaAssay is a modular high-throughput one-pot assay system for simultaneously measuring molecular function of thousands of genetic variants at very high accuracy. The activities of 5,886 mutations were classified, significantly more than mutants previously reported. These variants include 112 new in vitro, 10 known, and 9 new in vivo gain-of-function (GOF) mutations. Many of the GOFs spatially cluster in sequence and structure, supporting the activation mechanisms of heterodimerization with EGFR and release of kinase inhibition by the juxtamembrane domain. Retrospective analysis of patient outcomes from the Genomic Data Commons predicts increased survival with the newly identified HER2 GOF variants.
Collapse
Affiliation(s)
- Christopher J Giacoletto
- Heligenics Inc., 10530 Discovery Dr., Las Vegas, NV 89135 USA
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, 4505 S. Maryland Parkway, Las Vegas, Nevada, 89154 USA
- School of Life Sciences, University of Nevada, Las Vegas, 4505 S. Maryland Parkway, Las Vegas, Nevada, 89154 USA
| | - Liz J Valente
- Heligenics Inc., 10530 Discovery Dr., Las Vegas, NV 89135 USA
| | - Lancer Brown
- Heligenics Inc., 10530 Discovery Dr., Las Vegas, NV 89135 USA
| | - Sara Patterson
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Dr., Farmington, CT 06032
| | - Rewatee Gokhale
- The Jackson Laboratory for Genomic Medicine, 10 Discovery Dr., Farmington, CT 06032
| | | | | | - Hong-Wen Deng
- Tulane Center for Biomedical Informatics and Genomics, Department of Deming Department of Medicine, Tulane University, New Orleans, 70112 USA
| | - Jerome I Rotter
- Heligenics Inc., 10530 Discovery Dr., Las Vegas, NV 89135 USA
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502 USA
| | - Martin R Schiller
- Heligenics Inc., 10530 Discovery Dr., Las Vegas, NV 89135 USA
- Nevada Institute of Personalized Medicine, University of Nevada, Las Vegas, 4505 S. Maryland Parkway, Las Vegas, Nevada, 89154 USA
- School of Life Sciences, University of Nevada, Las Vegas, 4505 S. Maryland Parkway, Las Vegas, Nevada, 89154 USA
| |
Collapse
|
2
|
Zhang L, Deng T, Liufu Z, Chen X, Wu S, Liu X, Shi C, Chen B, Hu Z, Cai Q, Liu C, Li M, Tracy ME, Lu X, Wu CI, Wen HJ. Characterization of cancer-driving nucleotides (CDNs) across genes, cancer types, and patients. eLife 2024; 13:RP99341. [PMID: 39688957 DOI: 10.7554/elife.99341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2024] Open
Abstract
A central goal of cancer genomics is to identify, in each patient, all the cancer-driving mutations. Among them, point mutations are referred to as cancer-driving nucleotides (CDNs), which recur in cancers. The companion study shows that the probability of i recurrent hits in n patients would decrease exponentially with i; hence, any mutation with i ≥ 3 hits in The Cancer Genome Atlas (TCGA) database is a high-probability CDN. This study characterizes the 50-150 CDNs identifiable for each cancer type of TCGA (while anticipating 10 times more undiscovered ones) as follows: (i) CDNs tend to code for amino acids of divergent chemical properties. (ii) At the genic level, far more CDNs (more than fivefold) fall on noncanonical than canonical cancer-driving genes (CDGs). Most undiscovered CDNs are expected to be on unknown CDGs. (iii) CDNs tend to be more widely shared among cancer types than canonical CDGs, mainly because of the higher resolution at the nucleotide than the whole-gene level. (iv) Most important, among the 50-100 coding region mutations carried by a cancer patient, 5-8 CDNs are expected but only 0-2 CDNs have been identified at present. This low level of identification has hampered functional test and gene-targeted therapy. We show that, by expanding the sample size to 105, most CDNs can be identified. Full CDN identification will then facilitate the design of patient-specific targeting against multiple CDN-harboring genes.
Collapse
Affiliation(s)
- Lingjie Zhang
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Tong Deng
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Zhongqi Liufu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Center for Excellence in Animal Evolution and Genetics, The Chinese Academy of Sciences, Kunming, China
| | - Xiangnyu Chen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Shijie Wu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Xueyu Liu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Changhao Shi
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Bingjie Chen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- GMU-GIBH Joint School of Life Sciences, Guangzhou Medical University, Guangzhou, China
| | - Zheng Hu
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Qichun Cai
- Cancer Center, Clifford Hospital, Jinan University, Guangzhou, China
| | - Chenli Liu
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Mengfeng Li
- Cancer Research Institute, School of Basic Medical Sciences, Southern Medical University, Guangzhou, China
| | - Miles E Tracy
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Xuemei Lu
- Center for Excellence in Animal Evolution and Genetics, The Chinese Academy of Sciences, Kunming, China
| | - Chung-I Wu
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Department of Ecology and Evolution, University of Chicago, Chicago, United States
| | - Hai-Jun Wen
- State Key Laboratory of Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
3
|
An J, Nam CH, Kim R, Lee Y, Won H, Park S, Lee WH, Park H, Yoon CJ, An Y, Kim JH, Jun JK, Bae JM, Shin EC, Kim B, Cha YJ, Kwon HW, Oh JW, Park JY, Kim MJ, Ju YS. Mitochondrial DNA mosaicism in normal human somatic cells. Nat Genet 2024; 56:1665-1677. [PMID: 39039280 PMCID: PMC11319206 DOI: 10.1038/s41588-024-01838-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Accepted: 06/21/2024] [Indexed: 07/24/2024]
Abstract
Somatic cells accumulate genomic alterations with age; however, our understanding of mitochondrial DNA (mtDNA) mosaicism remains limited. Here we investigated the genomes of 2,096 clones derived from three cell types across 31 donors, identifying 6,451 mtDNA variants with heteroplasmy levels of ≳0.3%. While the majority of these variants were unique to individual clones, suggesting stochastic acquisition with age, 409 variants (6%) were shared across multiple embryonic lineages, indicating their origin from heteroplasmy in fertilized eggs. The mutational spectrum exhibited replication-strand bias, implicating mtDNA replication as a major mutational process. We evaluated the mtDNA mutation rate (5.0 × 10-8 per base pair) and a turnover frequency of 10-20 per year, which are fundamental components shaping the landscape of mtDNA mosaicism over a lifetime. The expansion of mtDNA-truncating mutations toward homoplasmy was substantially suppressed. Our findings provide comprehensive insights into the origins, dynamics and functional consequences of mtDNA mosaicism in human somatic cells.
Collapse
Affiliation(s)
- Jisong An
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Chang Hyun Nam
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Ryul Kim
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
- Inocras Inc, Daejeon, Republic of Korea
| | - Yunah Lee
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Hyein Won
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Seongyeol Park
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
- Inocras Inc, Daejeon, Republic of Korea
| | - Won Hee Lee
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Hansol Park
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
- Inocras Inc, Daejeon, Republic of Korea
| | - Christopher J Yoon
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
- Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Yohan An
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Jie-Hyun Kim
- Department of Internal Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Jong Kwan Jun
- Department of Obstetrics and Gynecology, Seoul National University Hospital, Seoul, Republic of Korea
| | - Jeong Mo Bae
- Department of Pathology, Seoul National University Hospital, Seoul, Republic of Korea
| | - Eui-Cheol Shin
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Bun Kim
- Center for Colorectal Cancer, Research Institute and Hospital, National Cancer Center, Goyang, Republic of Korea
| | - Yong Jun Cha
- Center for Colorectal Cancer, Research Institute and Hospital, National Cancer Center, Goyang, Republic of Korea
| | - Hyun Woo Kwon
- Department of Nuclear Medicine, Korea University College of Medicine, Seoul, Republic of Korea
| | - Ji Won Oh
- Department of Anatomy, Yonsei University College of Medicine, Seoul, Republic of Korea
| | - Jee Yoon Park
- Department of Obstetrics and Gynecology, Seoul National University Bundang Hospital, Seongnam, Republic of Korea
- Department of Obstetrics and Gynecology, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Min Jung Kim
- Department of Surgery, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Young Seok Ju
- Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea.
- Inocras Inc, Daejeon, Republic of Korea.
| |
Collapse
|
4
|
Rao Y, Ahmed N, Pritchard J, O'Brien EP. Incorporating mutational heterogeneity to identify genes that are enriched for synonymous mutations in cancer. BMC Bioinformatics 2023; 24:462. [PMID: 38062391 PMCID: PMC10704839 DOI: 10.1186/s12859-023-05521-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Accepted: 10/05/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Synonymous mutations, which change the DNA sequence but not the encoded protein sequence, can affect protein structure and function, mRNA maturation, and mRNA half-lives. The possibility that synonymous mutations might be enriched in cancer has been explored in several recent studies. However, none of these studies control for all three types of mutational heterogeneity (patient, histology, and gene) that are known to affect the accurate identification of non-synonymous cancer-associated genes. Our goal is to adopt the current standard for non-synonymous mutations in an investigation of synonymous mutations. RESULTS Here, we create an algorithm, MutSigCVsyn, an adaptation of MutSigCV, to identify cancer-associated genes that are enriched for synonymous mutations based on a non-coding background model that takes into account the mutational heterogeneity across these levels. Using MutSigCVsyn, we first analyzed 2572 cancer whole-genome samples from the Pan-cancer Analysis of Whole Genomes (PCAWG) to identify non-synonymous cancer drivers as a quality control. Indicative of the algorithm accuracy we find that 58.6% of these candidate genes were also found in Cancer Census Gene (CGC) list, and 66.2% were found within the PCAWG cancer driver list. We then applied it to identify 30 putative cancer-associated genes that are enriched for synonymous mutations within the same samples. One of the promising gene candidates is the B cell lymphoma 2 (BCL-2) gene. BCL-2 regulates apoptosis by antagonizing the action of proapoptotic BCL-2 family member proteins. The synonymous mutations in BCL2 are enriched in its anti-apoptotic domain and likely play a role in cancer cell proliferation. CONCLUSION Our study introduces MutSigCVsyn, an algorithm that accounts for mutational heterogeneity at patient, histology, and gene levels, to identify cancer-associated genes that are enriched for synonymous mutations using whole genome sequencing data. We identified 30 putative candidate genes that will benefit from future experimental studies on the role of synonymous mutations in cancer biology.
Collapse
Affiliation(s)
- Yiyun Rao
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, State College, PA, 16802, USA
| | - Nabeel Ahmed
- Huck Institute of the Life Sciences, Pennsylvania State University, University Park, State College, PA, 16802, USA
- Moderna, Inc., Cambridge, USA
| | - Justin Pritchard
- Department of Biomedical Engineering, Pennsylvania State University, University Park, State College, PA, 16802, USA.
| | - Edward P O'Brien
- Department of Chemistry, Pennsylvania State University, University Park, State College, PA, 16802, USA.
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, State College, PA, 16802, USA.
| |
Collapse
|
5
|
Yi K, Kim SY, Bleazard T, Kim T, Youk J, Ju YS. Mutational spectrum of SARS-CoV-2 during the global pandemic. Exp Mol Med 2021; 53:1229-1237. [PMID: 34453107 PMCID: PMC8393781 DOI: 10.1038/s12276-021-00658-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Revised: 04/29/2021] [Accepted: 05/11/2021] [Indexed: 02/07/2023] Open
Abstract
Viruses accumulate mutations under the influence of natural selection and host-virus interactions. Through a systematic comparison of 351,525 full viral genome sequences collected during the recent COVID-19 pandemic, we reveal the spectrum of SARS-CoV-2 mutations. Unlike those of other viruses, the mutational spectrum of SARS-CoV-2 exhibits extreme asymmetry, with a much higher rate of C>U than U>C substitutions, as well as a higher rate of G>U than U>G substitutions. This suggests directional genome sequence evolution during transmission. The substantial asymmetry and directionality of the mutational spectrum enable pseudotemporal tracing of SARS-CoV-2 without prior information about the root sequence, collection time, and sampling region. This shows that the viral genome sequences collected in Asia are similar to the original genome sequence. Adjusted estimation of the dN/dS ratio accounting for the asymmetrical mutational spectrum also shows evidence of negative selection on viral genes, consistent with previous reports. Our findings provide deep insights into the mutational processes in SARS-CoV-2 viral infection and advance the understanding of the history and future evolution of the virus.
Collapse
Affiliation(s)
- Kijong Yi
- grid.37172.300000 0001 2292 0500Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea
| | - Su Yeon Kim
- grid.37172.300000 0001 2292 0500Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea
| | - Thomas Bleazard
- grid.70909.370000 0001 2199 6511National Institute for Biological Standards and Control, Blanche Lane, South Mimms, Potters Bar, Hertfordshire, EN6 3QG UK
| | - Taewoo Kim
- grid.37172.300000 0001 2292 0500Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea
| | - Jeonghwan Youk
- grid.37172.300000 0001 2292 0500Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea ,grid.511166.4GENOME INSIGHT Inc, Daejeon, 34051 Korea
| | - Young Seok Ju
- grid.37172.300000 0001 2292 0500Graduate School of Medical Science and Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, 34141 Korea ,grid.511166.4GENOME INSIGHT Inc, Daejeon, 34051 Korea
| |
Collapse
|
6
|
Meyerson W, Leisman J, Navarro FCP, Gerstein M. Origins and characterization of variants shared between databases of somatic and germline human mutations. BMC Bioinformatics 2020; 21:227. [PMID: 32498674 PMCID: PMC7273669 DOI: 10.1186/s12859-020-3508-8] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Accepted: 04/20/2020] [Indexed: 01/26/2023] Open
Abstract
Background Mutations arise in the human genome in two major settings: the germline and the soma. These settings involve different inheritance patterns, time scales, chromatin structures, and environmental exposures, all of which impact the resulting distribution of substitutions. Nonetheless, many of the same single nucleotide variants (SNVs) are shared between germline and somatic mutation databases, such as between the gnomAD database of 120,000 germline exomes and the TCGA database of 10,000 somatic exomes. Here, we sought to explain this overlap. Results After strict filtering to exclude common germline polymorphisms and sites with poor coverage or mappability, we found 336,987 variants shared between the somatic and germline databases. A uniform statistical model explains 34% of these shared variants; a model that incorporates the varying mutation rates of the basic mutation types explains another 50% of shared variants; and a model that includes extended nucleotide contexts (e.g. surrounding 3 bases on either side) explains an additional 4% of shared variants. Analysis of read depth finds mixed evidence that up to 4% of the shared variants may represent germline variants leaked into somatic call sets. 9% of the shared variants are not explained by any model. Sequencing errors and convergent evolution did not account for these. We surveyed other factors as well: Cancers driven by endogenous mutational processes share a greater fraction of variants with the germline, and recently derived germline variants were more likely to be somatically shared than were ancient germline ones. Conclusions Overall, we find that shared variants largely represent bona fide biological occurrences of the same variant in the germline and somatic setting and arise primarily because DNA has some of the same basic chemical vulnerabilities in either setting. Moreover, we find mixed evidence that somatic call-sets leak appreciable numbers of germline variants, which is relevant to genomic privacy regulations. In future studies, the similar chemical vulnerability of DNA between the somatic and germline settings might be used to help identify disease-related genes by guiding the development of background-mutation models that are informed by both somatic and germline patterns of variation.
Collapse
Affiliation(s)
- William Meyerson
- Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06511, USA. .,Yale School of Medicine, Yale University, New Haven, CT, 06510, USA.
| | - John Leisman
- Molecular, Cellular and Developmental Biology, Yale University, New Haven, CT, 06510, USA
| | - Fabio C P Navarro
- Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06511, USA.,Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, 06511, USA
| | - Mark Gerstein
- Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06511, USA. .,Yale School of Medicine, Yale University, New Haven, CT, 06510, USA. .,Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, 06511, USA. .,Department of Computer Science, Yale University, New Haven, CT, 06511, USA.
| |
Collapse
|
7
|
Williams MJ, Zapata L, Werner B, Barnes CP, Sottoriva A, Graham TA. Measuring the distribution of fitness effects in somatic evolution by combining clonal dynamics with dN/dS ratios. eLife 2020; 9:e48714. [PMID: 32223898 PMCID: PMC7105384 DOI: 10.7554/elife.48714] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 03/09/2020] [Indexed: 12/22/2022] Open
Abstract
The distribution of fitness effects (DFE) defines how new mutations spread through an evolving population. The ratio of non-synonymous to synonymous mutations (dN/dS) has become a popular method to detect selection in somatic cells. However the link, in somatic evolution, between dN/dS values and fitness coefficients is missing. Here we present a quantitative model of somatic evolutionary dynamics that determines the selective coefficients of individual driver mutations from dN/dS estimates. We then measure the DFE for somatic mutant clones in ostensibly normal oesophagus and skin. We reveal a broad distribution of fitness effects, with the largest fitness increases found for TP53 and NOTCH1 mutants (proliferative bias 1-5%). This study provides the theoretical link between dN/dS values and selective coefficients in somatic evolution, and measures the DFE of mutations in human tissues.
Collapse
Affiliation(s)
- Marc J Williams
- Centre for Genomics and Computational Biology, Barts Cancer Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of LondonLondonUnited Kingdom
- Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer CenterNew YorkUnited States
| | - Luis Zapata
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer ResearchLondonUnited Kingdom
| | - Benjamin Werner
- Centre for Genomics and Computational Biology, Barts Cancer Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of LondonLondonUnited Kingdom
| | - Chris P Barnes
- Department of Cell and Developmental Biology, University College LondonLondonUnited Kingdom
| | - Andrea Sottoriva
- Evolutionary Genomics and Modelling Lab, Centre for Evolution and Cancer, The Institute of Cancer ResearchLondonUnited Kingdom
| | - Trevor A Graham
- Centre for Genomics and Computational Biology, Barts Cancer Institute, Barts and the London School of Medicine and Dentistry, Queen Mary University of LondonLondonUnited Kingdom
| |
Collapse
|
8
|
Salam T, Premila Devi S, Duncan Lyngdoh RH. Molecular criteria for mutagenesis by DNA methylation: Some computational elucidations. Mutat Res 2018; 807:10-20. [PMID: 29220701 DOI: 10.1016/j.mrfmmm.2017.10.004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 10/05/2017] [Accepted: 10/25/2017] [Indexed: 06/07/2023]
Abstract
Alkylating agents and N-nitroso compounds are well-known mutagens and carcinogens which act by alkylating DNA at the nucleobase moieties. Criteria for mutagenicity through DNA alkylation include (a) absence of the Watson-Crick (N1-guanine and N3-thymine) protons, (b) rotation of the alkyl group away from the H-bonding zone, (c) configuration of the alkylated base pair close to the Watson-Crick type. This computational study brings together these three molecular criteria for the first time. Three methylated DNA bases-N7-methylguanine, O6-methylguanine and O4-methylthymine-are studied using computational chemical methods. Watson-Crick proton loss is predicted more feasible for the mutagenic O6-methylguanine and O4-methylthymine than for the non-mutagenic N7-methylguanine in agreement with the observed trend for pKa values. Attainment of a conformer conducive to mutagenesis is more feasible for O6-methylguanine than for O4-methylthymine, though the latter is more mutagenic. These methylated bases yield 9 H-bonded pairs with normal DNA bases. At biological pH, O6-methylguanine and O4-methylthymine would yield stable mutagenic pairs having Watson-Crick type configuration by H-bonded pairing with thymine and guanine respectively, while N7-methylguanine would yield a non-mutagenic pair with cytosine. The three criteria thus well differentiate the non-mutagenic N7-methylguanine from the mutagenic O6-methylguanine and O4-methylthymine in good accord with experimental observations.
Collapse
Affiliation(s)
- Tejeshwori Salam
- Department of Chemistry, North-Eastern Hill University, Shillong 793022, India
| | - S Premila Devi
- Department of Chemistry, North-Eastern Hill University, Shillong 793022, India
| | - R H Duncan Lyngdoh
- Department of Chemistry, North-Eastern Hill University, Shillong 793022, India.
| |
Collapse
|
9
|
Martincorena I, Raine KM, Gerstung M, Dawson KJ, Haase K, Van Loo P, Davies H, Stratton MR, Campbell PJ. Universal Patterns of Selection in Cancer and Somatic Tissues. Cell 2017; 171:1029-1041.e21. [PMID: 29056346 PMCID: PMC5720395 DOI: 10.1016/j.cell.2017.09.042] [Citation(s) in RCA: 894] [Impact Index Per Article: 111.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Revised: 08/09/2017] [Accepted: 09/22/2017] [Indexed: 01/17/2023]
Abstract
Cancer develops as a result of somatic mutation and clonal selection, but quantitative measures of selection in cancer evolution are lacking. We adapted methods from molecular evolution and applied them to 7,664 tumors across 29 cancer types. Unlike species evolution, positive selection outweighs negative selection during cancer development. On average, <1 coding base substitution/tumor is lost through negative selection, with purifying selection almost absent outside homozygous loss of essential genes. This allows exome-wide enumeration of all driver coding mutations, including outside known cancer genes. On average, tumors carry ∼4 coding substitutions under positive selection, ranging from <1/tumor in thyroid and testicular cancers to >10/tumor in endometrial and colorectal cancers. Half of driver substitutions occur in yet-to-be-discovered cancer genes. With increasing mutation burden, numbers of driver mutations increase, but not linearly. We systematically catalog cancer genes and show that genes vary extensively in what proportion of mutations are drivers versus passengers.
Collapse
Affiliation(s)
| | - Keiran M Raine
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | - Moritz Gerstung
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Hinxton CB10 1SD, UK
| | - Kevin J Dawson
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | | | - Peter Van Loo
- The Francis Crick Institute, London NW1 1AT, UK; Department of Human Genetics, University of Leuven, Leuven 3000, Belgium
| | - Helen Davies
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | | | - Peter J Campbell
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK; Department of Haematology, University of Cambridge, Cambridge CB2 2XY, UK.
| |
Collapse
|
10
|
Golestan Hashemi FS, Razi Ismail M, Rafii Yusop M, Golestan Hashemi MS, Nadimi Shahraki MH, Rastegari H, Miah G, Aslani F. Intelligent mining of large-scale bio-data: Bioinformatics applications. BIOTECHNOL BIOTEC EQ 2017. [DOI: 10.1080/13102818.2017.1364977] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Affiliation(s)
- Farahnaz Sadat Golestan Hashemi
- Plant Genetics, AgroBioChem Department, Gembloux Agro-Bio Tech, University of Liege, Liege, Belgium
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mohd Razi Ismail
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mohd Rafii Yusop
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Mahboobe Sadat Golestan Hashemi
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
- Big Data Research Center, Najafabad Branch, Islamic Azad University, Isfahan, Iran
| | - Mohammad Hossein Nadimi Shahraki
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
- Big Data Research Center, Najafabad Branch, Islamic Azad University, Isfahan, Iran
| | - Hamid Rastegari
- Department of Software Engineering, Faculty of Computer Engineering, Najafabad Branch, Islamic Azad University, Isfahan,Iran
| | - Gous Miah
- Laboratory of Food Crops, Institute of Tropical Agriculture and Food Security, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| | - Farzad Aslani
- Department of Crop Science, Faculty of Agriculture, Universiti Putra Malaysia, Serdang, Selangor, Malaysia
| |
Collapse
|
11
|
Wu X, Li G. Prevalent Accumulation of Non-Optimal Codons through Somatic Mutations in Human Cancers. PLoS One 2016; 11:e0160463. [PMID: 27513638 PMCID: PMC4981346 DOI: 10.1371/journal.pone.0160463] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2015] [Accepted: 07/19/2016] [Indexed: 11/27/2022] Open
Abstract
Cancer is characterized by uncontrolled cell growth, and the cause of different cancers is generally attributed to checkpoint dysregulation of cell proliferation and apoptosis. Recent studies have shown that non-optimal codons were preferentially adopted by genes to generate cell cycle-dependent oscillations in protein levels. This raises the intriguing question of how dynamic changes of codon usage modulate the cancer genome to cope with a non-controlled proliferative cell cycle. In this study, we comprehensively analyzed the somatic mutations of codons in human cancers, and found that non-optimal codons tended to be accumulated through both synonymous and non-synonymous mutations compared with other types of genomic substitution. We further demonstrated that non-optimal codons were prevalently accumulated across different types of cancers, amino acids, and chromosomes, and genes with accumulation of non-optimal codons tended to be involved in protein interaction/signaling networks and encoded important enzymes in metabolic networks that played roles in cancer-related pathways. This study provides insights into the dynamics of codons in the cancer genome and demonstrates that accumulation of non-optimal codons may be an adaptive strategy for cancerous cells to win the competition with normal cells. This deeper interpretation of the patterns and the functional characterization of somatic mutations of codons will help to broaden the current understanding of the molecular basis of cancers.
Collapse
Affiliation(s)
- Xudong Wu
- Laboratory of Molecular Modeling and Design, State key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Rd., Dalian 116023, PR China
| | - Guohui Li
- Laboratory of Molecular Modeling and Design, State key Laboratory of Molecular Reaction Dynamics, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, 457 Zhongshan Rd., Dalian 116023, PR China
- * E-mail:
| |
Collapse
|
12
|
Martincorena I, Roshan A, Gerstung M, Ellis P, Van Loo P, McLaren S, Wedge DC, Fullam A, Alexandrov LB, Tubio JM, Stebbings L, Menzies A, Widaa S, Stratton MR, Jones PH, Campbell PJ. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 2015; 348:880-6. [PMID: 25999502 PMCID: PMC4471149 DOI: 10.1126/science.aaa6806] [Citation(s) in RCA: 1245] [Impact Index Per Article: 124.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
How somatic mutations accumulate in normal cells is central to understanding cancer development but is poorly understood. We performed ultradeep sequencing of 74 cancer genes in small (0.8 to 4.7 square millimeters) biopsies of normal skin. Across 234 biopsies of sun-exposed eyelid epidermis from four individuals, the burden of somatic mutations averaged two to six mutations per megabase per cell, similar to that seen in many cancers, and exhibited characteristic signatures of exposure to ultraviolet light. Remarkably, multiple cancer genes are under strong positive selection even in physiologically normal skin, including most of the key drivers of cutaneous squamous cell carcinomas. Positively selected mutations were found in 18 to 32% of normal skin cells at a density of ~140 driver mutations per square centimeter. We observed variability in the driver landscape among individuals and variability in the sizes of clonal expansions across genes. Thus, aged sun-exposed skin is a patchwork of thousands of evolving clones with over a quarter of cells carrying cancer-causing mutations while maintaining the physiological functions of epidermis.
Collapse
Affiliation(s)
| | - Amit Roshan
- MRC Cancer Unit, Hutchison-MRC Research Centre, University of Cambridge, Cambridge, UK
| | - Moritz Gerstung
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | - Peter Ellis
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | - Peter Van Loo
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK. Francis Crick Institute, London, UK. Department of Human Genetics, University of Leuven, Leuven, Belgium
| | - Stuart McLaren
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | - David C Wedge
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | - Anthony Fullam
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | | | - Jose M Tubio
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | - Lucy Stebbings
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | - Andrew Menzies
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | - Sara Widaa
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK
| | | | - Philip H Jones
- MRC Cancer Unit, Hutchison-MRC Research Centre, University of Cambridge, Cambridge, UK.
| | - Peter J Campbell
- Wellcome Trust Sanger Institute, Hinxton CB10 1SA, Cambridgeshire, UK. Department of Haematology, University of Cambridge, Cambridge, UK.
| |
Collapse
|
13
|
Foroughmand-Araabi MH, Goliaei B, Alishahi K, Sadeghi M, Goliaei S. Codon usage and protein sequence pattern dependency in different organisms: A Bioinformatics approach. J Bioinform Comput Biol 2014; 13:1550002. [PMID: 25409941 DOI: 10.1142/s021972001550002x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Although it is known that synonymous codons are not chosen randomly, the role of the codon usage in gene regulation is not clearly understood, yet. Researchers have investigated the relation between the codon usage and various properties, such as gene regulation, translation rate, translation efficiency, mRNA stability, splicing, and protein domains. Recently, a universal codon usage based mechanism for gene regulation is proposed. We studied the role of protein sequence patterns on the codons usage by related genes. Considering a subsequence of a protein that matches to a pattern or motif, we showed that, parts of the genes, which are translated to this subsequence, use specific ratios of synonymous codons. Also, we built a multinomial logistic regression statistical model for codon usage, which considers the effect of patterns on codon usage. This model justifies the observed codon usage preference better than the classic organism dependent codon usage. Our results showed that the codon usage plays a role in controlling protein levels, for genes that participate in a specific biological function. This is the first time that this phenomenon is reported.
Collapse
|
14
|
Mohanty AK, Datta A, Venkatraj V. Using the message passing algorithm on discrete data to detect faults in boolean regulatory networks. Algorithms Mol Biol 2014. [DOI: 10.1186/s13015-014-0020-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
15
|
Zhao Y, Epstein RJ. Conserved nonsense-prone CpG sites in apoptosis-regulatory genes: conditional stop signs on the road to cell death. Evol Bioinform Online 2013; 9:275-83. [PMID: 23908585 PMCID: PMC3728200 DOI: 10.4137/ebo.s11759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Methylation-prone CpG dinucleotides are strongly conserved in the germline, yet are also predisposed to somatic mutation. Here we quantify the relationship between germline codon mutability and somatic carcinogenesis by comparing usage of the nonsense-prone CGA (→TGA) codons in gene groups that differ in apoptotic function; to this end, suppressor genes were subclassified as either apoptotic (gatekeepers) or repair (caretakers). Mutations affecting CGA codons in sporadic tumors proved to be highly asymmetric. Moreover, nonsense mutations were 3-fold more likely to affect gatekeepers than caretakers. In addition, intragenic CGA clustering nonrandomly affected functionally critical regions of gatekeepers. We conclude that human gatekeeper suppressor genes are enriched for nonsense-prone codons, and submit that this germline vulnerability to tumors could reflect in utero selection for a methylation-dependent capability to short-circuit environmental insults that otherwise trigger apoptosis and fetal loss.
Collapse
Affiliation(s)
- Yongzhong Zhao
- Department of Genetics, Mount Sinai School of Medicine, New York, USA
| | | |
Collapse
|
16
|
Roberts ND, Kortschak RD, Parker WT, Schreiber AW, Branford S, Scott HS, Glonek G, Adelson DL. A comparative analysis of algorithms for somatic SNV detection in cancer. ACTA ACUST UNITED AC 2013; 29:2223-30. [PMID: 23842810 PMCID: PMC3753564 DOI: 10.1093/bioinformatics/btt375] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Motivation: With the advent of relatively affordable high-throughput technologies, DNA sequencing of cancers is now common practice in cancer research projects and will be increasingly used in clinical practice to inform diagnosis and treatment. Somatic (cancer-only) single nucleotide variants (SNVs) are the simplest class of mutation, yet their identification in DNA sequencing data is confounded by germline polymorphisms, tumour heterogeneity and sequencing and analysis errors. Four recently published algorithms for the detection of somatic SNV sites in matched cancer–normal sequencing datasets are VarScan, SomaticSniper, JointSNVMix and Strelka. In this analysis, we apply these four SNV calling algorithms to cancer–normal Illumina exome sequencing of a chronic myeloid leukaemia (CML) patient. The candidate SNV sites returned by each algorithm are filtered to remove likely false positives, then characterized and compared to investigate the strengths and weaknesses of each SNV calling algorithm. Results: Comparing the candidate SNV sets returned by VarScan, SomaticSniper, JointSNVMix2 and Strelka revealed substantial differences with respect to the number and character of sites returned; the somatic probability scores assigned to the same sites; their susceptibility to various sources of noise; and their sensitivities to low-allelic-fraction candidates. Availability: Data accession number SRA081939, code at http://code.google.com/p/snv-caller-review/ Contact:david.adelson@adelaide.edu.au Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Nicola D Roberts
- School of Molecular and Biomedical Science and School of Mathematical Sciences, University of Adelaide, South Australia, Australia
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Huang T, Niu S, Xu Z, Huang Y, Kong X, Cai YD, Chou KC. Predicting transcriptional activity of multiple site p53 mutants based on hybrid properties. PLoS One 2011; 6:e22940. [PMID: 21857971 PMCID: PMC3152557 DOI: 10.1371/journal.pone.0022940] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2011] [Accepted: 07/01/2011] [Indexed: 11/26/2022] Open
Abstract
As an important tumor suppressor protein, reactivate mutated p53 was found in many kinds of human cancers and that restoring active p53 would lead to tumor regression. In this work, we developed a new computational method to predict the transcriptional activity for one-, two-, three- and four-site p53 mutants, respectively. With the approach from the general form of pseudo amino acid composition, we used eight types of features to represent the mutation and then selected the optimal prediction features based on the maximum relevance, minimum redundancy, and incremental feature selection methods. The Mathew's correlation coefficients (MCC) obtained by using nearest neighbor algorithm and jackknife cross validation for one-, two-, three- and four-site p53 mutants were 0.678, 0.314, 0.705, and 0.907, respectively. It was revealed by the further optimal feature set analysis that the 2D (two-dimensional) structure features composed the largest part of the optimal feature set and maybe played the most important roles in all four types of p53 mutant active status prediction. It was also demonstrated by the optimal feature sets, especially those at the top level, that the 3D structure features, conservation, physicochemical and biochemical properties of amino acid near the mutation site, also played quite important roles for p53 mutant active status prediction. Our study has provided a new and promising approach for finding functionally important sites and the relevant features for in-depth study of p53 protein and its action mechanism.
Collapse
Affiliation(s)
- Tao Huang
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
- Shanghai Center for Bioinformation Technology, Shanghai, People's Republic of China
| | - Shen Niu
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, People's Republic of China
| | - Zhongping Xu
- Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
| | - Yun Huang
- Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
| | - Xiangyin Kong
- Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Shanghai Jiao Tong University School of Medicine, Shanghai, People's Republic of China
- State Key Laboratory of Medical Genomics, Ruijin Hospital, Shanghai Jiaotong University, Shanghai, People's Republic of China
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, People's Republic of China
- Centre for Computational Systems Biology, Fudan University, Shanghai, People's Republic of China
- Gordon Life Science Institute, San Diego, California, United States of America
| | - Kuo-Chen Chou
- Gordon Life Science Institute, San Diego, California, United States of America
| |
Collapse
|
18
|
Abstract
A key goal in cancer research is to find the genomic alterations that underlie malignant cells. Genomics has proved successful in identifying somatic variants at a large scale. However, it has become evident that a typical cancer exhibits a heterogenous mutation pattern across samples. Cases where the same alteration is observed repeatedly seem to be the exception rather than the norm. Thus, pinpointing the key alterations (driver mutations) from a background of variations with no direct causal link to cancer (passenger mutations) is difficult. Here we analyze somatic missense mutations from cancer samples and their healthy tissue counterparts (germline mutations) from the viewpoint of germline fitness. We calibrate a scoring system from protein domain alignments to score mutations and their target loci. We show first that this score predicts to a good degree the rate of polymorphism of the observed germline variation. The scoring is then applied to somatic mutations. We show that candidate cancer genes prone to copy number loss harbor mutations with germline fitness effects that are significantly more deleterious than expected by chance. This suggests that missense mutations play a driving role in tumor suppressor genes. Furthermore, these mutations fall preferably onto loci in sequence neighborhoods that are high scoring in terms of germline fitness. In contrast, for somatic mutations in candidate onco genes we do not observe a statistically significant effect. These results help to inform how to exploit germline fitness predictions in discovering new genes and mutations responsible for cancer.
Collapse
|
19
|
Gu Y, Yang D, Zou J, Ma W, Wu R, Zhao W, Zhang Y, Xiao H, Gong X, Zhang M, Zhu J, Guo Z. Systematic interpretation of comutated genes in large-scale cancer mutation profiles. Mol Cancer Ther 2010; 9:2186-95. [PMID: 20663929 DOI: 10.1158/1535-7163.mct-10-0022] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
By high-throughput screens of somatic mutations of genes in cancer genomes, hundreds of cancer genes are being rapidly identified, providing us abundant information for systematically deciphering the genetic changes underlying cancer mechanism. However, the functional collaboration of mutated genes is often neglected in current studies. Here, using four genome-wide somatic mutation data sets and pathways defined in various databases, we showed that gene pairs significantly comutated in cancer samples tend to distribute between pathways rather than within pathways. At the basic functional level of motifs in the human protein-protein interaction network, we also found that comutated gene pairs were overrepresented between motifs but extremely depleted within motifs. Specifically, we showed that based on Gene Ontology that describes gene functions at various specific levels, we could tackle the pathway definition problem to some degree and study the functional collaboration of gene mutations in cancer genomes more efficiently. Then, by defining pairs of pathways frequently linked by comutated gene pairs as the between-pathway models, we showed they are also likely to be codisrupted by mutations of the interpathway hubs of the coupled pathways, suggesting new hints for understanding the heterogeneous mechanisms of cancers. Finally, we showed some between-pathway models consisting of important pathways such as cell cycle checkpoint and cell proliferation were codisrupted in most cancer samples under this study, suggesting that their codisruptions might be functionally essential in inducing these cancers. All together, our results would provide a channel to detangle the complex collaboration of the molecular processes underlying cancer mechanism.
Collapse
Affiliation(s)
- Yunyan Gu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Ye J, Pavlicek A, Lunney EA, Rejto PA, Teng CH. Statistical method on nonrandom clustering with application to somatic mutations in cancer. BMC Bioinformatics 2010; 11:11. [PMID: 20053295 PMCID: PMC2822753 DOI: 10.1186/1471-2105-11-11] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2009] [Accepted: 01/07/2010] [Indexed: 02/07/2023] Open
Abstract
Background Human cancer is caused by the accumulation of tumor-specific mutations in oncogenes and tumor suppressors that confer a selective growth advantage to cells. As a consequence of genomic instability and high levels of proliferation, many passenger mutations that do not contribute to the cancer phenotype arise alongside mutations that drive oncogenesis. While several approaches have been developed to separate driver mutations from passengers, few approaches can specifically identify activating driver mutations in oncogenes, which are more amenable for pharmacological intervention. Results We propose a new statistical method for detecting activating mutations in cancer by identifying nonrandom clusters of amino acid mutations in protein sequences. A probability model is derived using order statistics assuming that the location of amino acid mutations on a protein follows a uniform distribution. Our statistical measure is the differences between pair-wise order statistics, which is equivalent to the size of an amino acid mutation cluster, and the probabilities are derived from exact and approximate distributions of the statistical measure. Using data in the Catalog of Somatic Mutations in Cancer (COSMIC) database, we have demonstrated that our method detects well-known clusters of activating mutations in KRAS, BRAF, PI3K, and β-catenin. The method can also identify new cancer targets as well as gain-of-function mutations in tumor suppressors. Conclusions Our proposed method is useful to discover activating driver mutations in cancer by identifying nonrandom clusters of somatic amino acid mutations in protein sequences.
Collapse
Affiliation(s)
- Jingjing Ye
- Global Pre-Clinical Statistics, Pfizer Global Research and Development, San Diego, CA 92121, USA.
| | | | | | | | | |
Collapse
|
21
|
Ro S, Rannala B. Inferring somatic mutation rates using the stop-enhanced green fluorescent protein mouse. Genetics 2007; 177:9-16. [PMID: 17603123 PMCID: PMC2013726 DOI: 10.1534/genetics.106.069310] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2006] [Accepted: 06/13/2007] [Indexed: 02/06/2023] Open
Abstract
A new method is developed for estimating rates of somatic mutation in vivo. The stop-enhanced green fluorescent protein (EGFP) transgenic mouse carries multiple copies of an EGFP gene with a premature stop codon. The gene can revert to a functional form via point mutations. Mice treated with a potent mutagen, N-ethyl-N-nitrosourea (ENU), and mice treated with a vehicle alone are assayed for mutations in liver cells. A stochastic model is developed to model the mutation and gene expression processes and maximum-likelihood estimators of the model parameters are derived. A likelihood-ratio test (LRT) is developed for detecting mutagenicity. Parametric bootstrap simulations are used to obtain confidence intervals of the parameter estimates and to estimate the significance of the LRT. The LRT is highly significant (alpha < 0.01) and the 95% confidence interval for the relative effect of the mutagen (the ratio of the rate of mutation during the interval of mutagen exposure to the rate of background mutation) ranges from a minimum 200-fold effect of the mutagen to a maximum 2000-fold effect.
Collapse
Affiliation(s)
- Simon Ro
- Department of Medical Genetics, University of Alberta, Edmonton, Alberta T6G 2H7, Canada
| | | |
Collapse
|
22
|
Kashkin KN, Khlgatian SV, Gurova OV, Kuprash DV, Nedospasov SA. New mutations in the human p53 gene--a regulator of the cell cycle and carcinogenesis. BIOCHEMISTRY (MOSCOW) 2007; 72:282-92. [PMID: 17447881 DOI: 10.1134/s0006297907030054] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Mutations in the tumor suppressor gene p53 often lead to disarrangement of the cell cycle and of genetic integrity control of cells that may contribute to tumor development. We studied p53 gene mutations in 26 primary tumors of colorectal cancer patients. Mutations in p53 were found in 17 tumors (65.4%). All point mutations affected the DNA binding domain of p53 and were localized in exons 4-8 of the gene. Mutant p53 isoforms with altered domain structure and/or with alternative C-terminus arising from frameshift mutations or abnormal splicing were found in six tumors. Mutations Leu111Gln and Ser127Phe were shown in colorectal cancer for the first time. Isoforms p53-305 with C(4) insertion in codons 300/301 and p53i9* including an additional 44 nucleotides of the 3 -end of intron 9 were discovered for the first time. Mutations of p53 were associated with lymph node metastases and III/IV stage of tumors that are signs of unfavorable prognosis in colorectal cancer.
Collapse
Affiliation(s)
- K N Kashkin
- Department of Molecular Immunology, Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow, Russia.
| | | | | | | | | |
Collapse
|
23
|
Zhang W, Bouffard GG, Wallace SS, Bond JP. Estimation of DNA sequence context-dependent mutation rates using primate genomic sequences. J Mol Evol 2007; 65:207-14. [PMID: 17676366 DOI: 10.1007/s00239-007-9000-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2004] [Accepted: 11/30/2006] [Indexed: 10/23/2022]
Abstract
It is understood that DNA and amino acid substitution rates are highly sequence context-dependent, e.g., C --> T substitutions in vertebrates may occur much more frequently at CpG sites and that cysteine substitution rates may depend on support of the context for participation in a disulfide bond. Furthermore, many applications rely on quantitative models of nucleotide or amino acid substitution, including phylogenetic inference and identification of amino acid sequence positions involved in functional specificity. We describe quantification of the context dependence of nucleotide substitution rates using baboon, chimpanzee, and human genomic sequence data generated by the NISC Comparative Sequencing Program. Relative mutation rates are reported for the 96 classes of mutations of the form 5' alphabetagamma 3' --> 5' alphadeltagamma 3', where alpha, beta, gamma, and delta are nucleotides and beta not equal delta, based on maximum likelihood calculations. Our results confirm that C --> T substitutions are enhanced at CpG sites compared with other transitions, relatively independent of the identity of the preceding nucleotide. While, as expected, transitions generally occur more frequently than transversions, we find that the most frequent transversions involve the C at CpG sites (CpG transversions) and that their rate is comparable to the rate of transitions at non-CpG sites. A four-class model of the rates of context-dependent evolution of primate DNA sequences, CpG transitions > non-CpG transitions approximately CpG transversions > non-CpG transversions, captures qualitative features of the mutation spectrum. We find that despite qualitative similarity of mutation rates among different genomic regions, there are statistically significant differences.
Collapse
Affiliation(s)
- Wei Zhang
- Department of Medicine, University of Chicago, 515 CLSC, Chicago, IL 60637, USA
| | | | | | | |
Collapse
|
24
|
Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, Davies H, Teague J, Butler A, Stevens C, Edkins S, O'Meara S, Vastrik I, Schmidt EE, Avis T, Barthorpe S, Bhamra G, Buck G, Choudhury B, Clements J, Cole J, Dicks E, Forbes S, Gray K, Halliday K, Harrison R, Hills K, Hinton J, Jenkinson A, Jones D, Menzies A, Mironenko T, Perry J, Raine K, Richardson D, Shepherd R, Small A, Tofts C, Varian J, Webb T, West S, Widaa S, Yates A, Cahill DP, Louis DN, Goldstraw P, Nicholson AG, Brasseur F, Looijenga L, Weber BL, Chiew YE, DeFazio A, Greaves MF, Green AR, Campbell P, Birney E, Easton DF, Chenevix-Trench G, Tan MH, Khoo SK, Teh BT, Yuen ST, Leung SY, Wooster R, Futreal PA, Stratton MR. Patterns of somatic mutation in human cancer genomes. Nature 2007; 446:153-8. [PMID: 17344846 PMCID: PMC2712719 DOI: 10.1038/nature05610] [Citation(s) in RCA: 2288] [Impact Index Per Article: 127.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2006] [Accepted: 01/18/2007] [Indexed: 11/09/2022]
Abstract
Cancers arise owing to mutations in a subset of genes that confer growth advantage. The availability of the human genome sequence led us to propose that systematic resequencing of cancer genomes for mutations would lead to the discovery of many additional cancer genes. Here we report more than 1,000 somatic mutations found in 274 megabases (Mb) of DNA corresponding to the coding exons of 518 protein kinase genes in 210 diverse human cancers. There was substantial variation in the number and pattern of mutations in individual cancers reflecting different exposures, DNA repair defects and cellular origins. Most somatic mutations are likely to be 'passengers' that do not contribute to oncogenesis. However, there was evidence for 'driver' mutations contributing to the development of the cancers studied in approximately 120 genes. Systematic sequencing of cancer genomes therefore reveals the evolutionary diversity of cancers and implicates a larger repertoire of cancer genes than previously anticipated.
Collapse
Affiliation(s)
- Christopher Greenman
- Cancer Genome Project, Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Darling AE, Treangen TJ, Messeguer X, Perna NT. Analyzing patterns of microbial evolution using the mauve genome alignment system. Methods Mol Biol 2007; 396:135-52. [PMID: 18025691 DOI: 10.1007/978-1-59745-515-2_10] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]
Abstract
During the course of evolution, genomes can undergo large-scale mutation events such as rearrangement and lateral transfer. Such mutations can result in significant variations in gene order and gene content among otherwise closely related organisms. The Mauve genome alignment system can successfully identify such rearrangement and lateral transfer events in comparisons of multiple microbial genomes even under high levels of recombination. This chapter outlines the main features of Mauve and provides examples that describe how to use Mauve to conduct a rigorous multiple genome comparison and study evolutionary patterns.
Collapse
|
26
|
Greenman C, Wooster R, Futreal PA, Stratton MR, Easton DF. Statistical analysis of pathogenicity of somatic mutations in cancer. Genetics 2006; 173:2187-98. [PMID: 16783027 PMCID: PMC1569711 DOI: 10.1534/genetics.105.044677] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Recent large-scale sequencing studies have revealed that cancer genomes contain variable numbers of somatic point mutations distributed across many genes. These somatic mutations most likely include passenger mutations that are not cancer causing and pathogenic driver mutations in cancer genes. Establishing a significant presence of driver mutations in such data sets is of biological interest. Whereas current techniques from phylogeny are applicable to large data sets composed of singly mutated samples, recently exemplified with a p53 mutation database, methods for smaller data sets containing individual samples with multiple mutations need to be developed. By constructing distinct models of both the mutation process and selection pressure upon the cancer samples, exact statistical tests to examine this problem are devised. Tests to examine the significance of selection toward missense, nonsense, and splice site mutations are derived, along with tests assessing variation in selection between functional domains. Maximum-likelihood methods facilitate parameter estimation, including levels of selection pressure and minimum numbers of pathogenic mutations. These methods are illustrated with 25 breast cancers screened across the coding sequences of 518 kinase genes, revealing 90 base substitutions in 71 genes. Significant selection pressure upon truncating mutations was established. Furthermore, an estimated minimum of 29.8 mutations were pathogenic.
Collapse
Affiliation(s)
- Chris Greenman
- Cancer Genome Project, Wellcome Trust Sanger Institute, Cambridge, United Kingdom.
| | | | | | | | | |
Collapse
|
27
|
Kouidou S, Malousi A, Maglaveras N. Methylation and repeats in silent and nonsense mutations of p53. Mutat Res 2006; 599:167-77. [PMID: 16620878 DOI: 10.1016/j.mrfmmm.2006.03.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2005] [Revised: 02/22/2006] [Accepted: 03/01/2006] [Indexed: 12/16/2022]
Abstract
All exonic CG sequences in p53 are methylated; this epigenetic modification is correlated with frequent G:C-->A:T transitions in p53. Recent reports reveal the presence in p53 of non-CG methylation in CC and CCC sequences, complementary to sites of selective guanosine adduct formation (GG and GGG), and the association of genetic instability with methylation at repetitive sequences. We presently investigated the distribution of methylation sites and repetitive elements in silent and nonsense p53 mutations (2051) among the IARC's TP53 somatic mutation database for exons 5-8. Silent mutations are nonrandom, but mostly involve G:C-->A:T transitions (62%); in particular C-->T mutations (39% of all silent mutations) are mostly correlated with CC and CCC sequences, while G-->A mutations with GG sequences. Sequence analysis of all non-G:C-->A:T silent mutations reveals the frequent formation of new methylation sites (CG), new CCC and GGG sequences in the resulting sequence, refinement of symmetry elements at interrupted microsatellite-like sequences and formation of small repeats (55.3%). The G:C-->A:T silent mutations characterize cancers associated with cigarette smoking (e.g. bladder or lung and bronchus cancer versus colorectal cancer); on the contrary, non-G:C-->A:T silent mutations have similar frequencies in most cancers. Nonsense mutations in exons 5-8, all resulting in mutants lacking amino acids 307-393, which are crucial for p53 activity, were also analyzed. The frequency of nonsense mutations is higher at methylated sites or repeats 1-2 nucleotides removed from methylation sites. Frameshift mutations are also more frequent at repeated sequences. The frequent G:C-->A:T silent mutations could indicate that CC and CCC sequences of exons 5-8 are occasionally targets of non-CpG methylation of cytosine. This process of de novo methylation in the presence of microsatellite-like sequences and small repeats might influence the genetic stability of a variety of genes.
Collapse
Affiliation(s)
- Sofia Kouidou
- Laboratory of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece.
| | | | | |
Collapse
|