1
|
Zhang Y, Leung AK, Kang JJ, Sun Y, Wu G, Li L, Sun J, Cheng L, Qiu T, Zhang J, Wierbowski SD, Gupta S, Booth JG, Yu H. A multiscale functional map of somatic mutations in cancer integrating protein structure and network topology. Nat Commun 2025; 16:975. [PMID: 39856048 PMCID: PMC11760531 DOI: 10.1038/s41467-024-54176-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 11/04/2024] [Indexed: 01/27/2025] Open
Abstract
A major goal of cancer biology is to understand the mechanisms driven by somatically acquired mutations. Two distinct methodologies-one analyzing mutation clustering within protein sequences and 3D structures, the other leveraging protein-protein interaction network topology-offer complementary strengths. We present NetFlow3D, a unified, end-to-end 3D structurally-informed protein interaction network propagation framework that maps the multiscale mechanistic effects of mutations. Built upon the Human Protein Structurome, which incorporates the 3D structures of every protein and the binding interfaces of all known protein interactions, NetFlow3D integrates atomic, residue, protein and network-level information: It clusters mutations on 3D protein structures to identify driver mutations and propagates their impacts anisotropically across the protein interaction network, guided by the involved interaction interfaces, to reveal systems-level impacts. Applied to 33 cancer types, NetFlow3D identifies 2 times more 3D clusters and incorporates 8 times more proteins in significantly interconnected network modules compared to traditional methods.
Collapse
Affiliation(s)
- Yingying Zhang
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, 14853, NY, USA
| | - Alden K Leung
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Yu Sun
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Guanxi Wu
- College of Agriculture and Life Sciences, Cornell University, Ithaca, 14853, NY, USA
| | - Le Li
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Jiayang Sun
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Lily Cheng
- Department of Science and Technology Studies, Cornell University, Ithaca, 14853, NY, USA
| | - Tian Qiu
- School of Electrical and Computer Engineering, Cornell University, Ithaca, 14853, NY, USA
| | - Junke Zhang
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Shayne D Wierbowski
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - Shagun Gupta
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA
| | - James G Booth
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA
- Department of Statistics and Data Science, Cornell University, Ithaca, 14853, NY, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University, Ithaca, 14853, NY, USA.
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, 14853, NY, USA.
| |
Collapse
|
2
|
Zhang Y, Leung AK, Kang JJ, Sun Y, Wu G, Li L, Sun J, Cheng L, Qiu T, Zhang J, Wierbowski S, Gupta S, Booth J, Yu H. A multiscale functional map of somatic mutations in cancer integrating protein structure and network topology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.03.06.531441. [PMID: 36945530 PMCID: PMC10028849 DOI: 10.1101/2023.03.06.531441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2023]
Abstract
A major goal of cancer biology is to understand the mechanisms underlying tumorigenesis driven by somatically acquired mutations. Two distinct types of computational methodologies have emerged: one focuses on analyzing clustering of mutations within protein sequences and 3D structures, while the other characterizes mutations by leveraging the topology of protein-protein interaction network. Their insights are largely non-overlapping, offering complementary strengths. Here, we established a unified, end-to-end 3D structurally-informed protein interaction network propagation framework, NetFlow3D, that systematically maps the multiscale mechanistic effects of somatic mutations in cancer. The establishment of NetFlow3D hinges upon the Human Protein Structurome, a comprehensive repository we compiled that incorporates the 3D structures of every single protein as well as the binding interfaces of all known protein interactions in humans. NetFlow3D leverages the Structurome to integrate information across atomic, residue, protein and network levels: It conducts 3D clustering of mutations across atomic and residue levels on protein structures to identify potential driver mutations. It then anisotropically propagates their impacts across the protein interaction network, with propagation guided by the specific 3D structural interfaces involved, to identify significantly interconnected network "modules", thereby uncovering key biological processes underlying disease etiology. Applied to 1,038,899 somatic protein-altering mutations in 9,946 TCGA tumors across 33 cancer types, NetFlow3D identified 1,4444 significant 3D clusters throughout the Human Protein Structurome, of which ~55% would not have been found if using only experimentally-determined structures. It then identified 26 significantly interconnected modules that encompass ~8-fold more proteins than applying standard network analyses. NetFlow3D and our pan-cancer results can be accessed from http://netflow3d.yulab.org.
Collapse
Affiliation(s)
- Yingying Zhang
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
- Department of Molecular Biology and Genetics, Cornell University; Ithaca, 14853, USA
| | - Alden K. Leung
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Jin Joo Kang
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Yu Sun
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Guanxi Wu
- College of Agriculture and Life Sciences, Cornell University; Ithaca, 14853, USA
| | - Le Li
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Jiayang Sun
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
| | - Lily Cheng
- Department of Science and Technology Studies, Cornell University; Ithaca, 14853, USA
| | - Tian Qiu
- School of Electrical and Computer Engineering, Cornell University; Ithaca, 14853, USA
| | - Junke Zhang
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Shayne Wierbowski
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - Shagun Gupta
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| | - James Booth
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Department of Statistics and Data Science, Cornell University; Ithaca, 14853, USA
| | - Haiyuan Yu
- Department of Computational Biology, Cornell University; Ithaca, 14853, USA
- Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA
| |
Collapse
|
3
|
Wang B, Lei X, Tian W, Perez-Rathke A, Tseng YY, Liang J. Structure-based pathogenicity relationship identifier for predicting effects of single missense variants and discovery of higher-order cancer susceptibility clusters of mutations. Brief Bioinform 2023; 24:bbad206. [PMID: 37332013 PMCID: PMC10359089 DOI: 10.1093/bib/bbad206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2023] [Revised: 04/19/2023] [Accepted: 05/13/2023] [Indexed: 06/20/2023] Open
Abstract
We report the structure-based pathogenicity relationship identifier (SPRI), a novel computational tool for accurate evaluation of pathological effects of missense single mutations and prediction of higher-order spatially organized units of mutational clusters. SPRI can effectively extract properties determining pathogenicity encoded in protein structures, and can identify deleterious missense mutations of germ line origin associated with Mendelian diseases, as well as mutations of somatic origin associated with cancer drivers. It compares favorably to other methods in predicting deleterious mutations. Furthermore, SPRI can discover spatially organized pathogenic higher-order spatial clusters (patHOS) of deleterious mutations, including those of low recurrence, and can be used for discovery of candidate cancer driver genes and driver mutations. We further demonstrate that SPRI can take advantage of AlphaFold2 predicted structures and can be deployed for saturation mutation analysis of the whole human proteome.
Collapse
Affiliation(s)
- Boshen Wang
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill, Department of Biomedical Engineering, University of Illinois at Chicago, W103 Suite, 820 S Wood St, 60612 IL, USA
| | - Xue Lei
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill, Department of Biomedical Engineering, University of Illinois at Chicago, W103 Suite, 820 S Wood St, 60612 IL, USA
| | - Wei Tian
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill, Department of Biomedical Engineering, University of Illinois at Chicago, W103 Suite, 820 S Wood St, 60612 IL, USA
| | - Alan Perez-Rathke
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill, Department of Biomedical Engineering, University of Illinois at Chicago, W103 Suite, 820 S Wood St, 60612 IL, USA
| | - Yan-Yuan Tseng
- Center for Molecular Medicine and Genetics, Biochemistry and Molecular Biology Department, School of Medicine, Wayne State University, 540 E. Canfield Avenue, 48201MI, USA
| | - Jie Liang
- Center for Bioinformatics and Quantitative Biology, Richard and Loan Hill, Department of Biomedical Engineering, University of Illinois at Chicago, W103 Suite, 820 S Wood St, 60612 IL, USA
| |
Collapse
|
4
|
Li S, Chen X, Chen J, Wu B, Liu J, Guo Y, Li M, Pu X. Multi-omics integration analysis of GPCRs in pan-cancer to uncover inter-omics relationships and potential driver genes. Comput Biol Med 2023; 161:106988. [PMID: 37201441 DOI: 10.1016/j.compbiomed.2023.106988] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 03/30/2023] [Accepted: 04/27/2023] [Indexed: 05/20/2023]
Abstract
G protein-coupled receptors (GPCRs) are the largest drug target family. Unfortunately, applications of GPCRs in cancer therapy are scarce due to very limited knowledge regarding their correlations with cancers. Multi-omics data enables systematic investigations of GPCRs, yet their effective integration remains a challenge due to the complexity of the data. Here, we adopt two types of integration strategies, multi-staged and meta-dimensional approaches, to fully characterize somatic mutations, somatic copy number alterations (SCNAs), DNA methylations, and mRNA expressions of GPCRs in 33 cancers. Results from the multi-staged integration reveal that GPCR mutations cannot well predict expression dysregulation. The correlations between expressions and SCNAs are primarily positive, while correlations of the methylations with expressions and SCNAs are bimodal with negative correlations predominating. Based on these correlations, 32 and 144 potential cancer-related GPCRs driven by aberrant SCNA and methylation are identified, respectively. In addition, the meta-dimensional integration analysis is carried out by using deep learning models, which predict more than one hundred GPCRs as potential oncogenes. When comparing results between the two integration strategies, 165 cancer-related GPCRs are common in both, suggesting that they should be prioritized in future studies. However, 172 GPCRs emerge in only one, indicating that the two integration strategies should be considered concurrently to complement the information missed by the other such that obtain a more comprehensive understanding. Finally, correlation analysis further reveals that GPCRs, in particular for the class A and adhesion receptors, are generally immune-related. In a whole, the work is for the first time to reveal the associations between different omics layers and highlight the necessity of combing the two strategies in identifying cancer-related GPCRs.
Collapse
Affiliation(s)
- Shiqi Li
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Xin Chen
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Jianfang Chen
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Binjian Wu
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Jing Liu
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu, 610064, China.
| |
Collapse
|
5
|
Pandey M, Gromiha MM. MutBLESS: A tool to identify disease-prone sites in cancer using deep learning. Biochim Biophys Acta Mol Basis Dis 2023; 1869:166721. [PMID: 37105446 DOI: 10.1016/j.bbadis.2023.166721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/07/2023] [Accepted: 04/12/2023] [Indexed: 04/29/2023]
Abstract
Understanding the molecular basis and impact of mutations at different stages of cancer are long-standing challenges in cancer biology. Identification of driver mutations from experiments is expensive and time intensive. In the present study, we collected the data for experimentally known driver mutations in 22 different cancer types and classified them into six categories: breast cancer (BRCA), acute myeloid leukaemia (LAML), endometrial carcinoma (EC), stomach cancer (STAD), skin cancer (SKCM), and other cancer types which contains 5747 disease prone and 5514 neutral sites in 516 proteins. The analysis of amino acid distribution along mutant sites revealed that the motifs AAA and LR are preferred in disease-prone sites whereas QPP and QF are dominant in neutral sites. Further, we developed a method using deep neural networks to predict disease-prone sites with amino acid sequence-based features such as physicochemical properties, secondary structure, tri-peptide motifs and conservation scores. We obtained an average AUC of 0.97 in five cancer types BRCA, LAML, EC, STAD and SKCM in a test dataset and 0.72 in all other cancer types together. Our method showed excellent performance for identifying cancer-specific mutations with an average sensitivity, specificity, and accuracy of 96.56 %, 97.39 %, and 97.64 %, respectively. We developed a web server for identifying cancer-prone sites, and it is available at https://web.iitm.ac.in/bioinfo2/MutBLESS/index.html. We suggest that our method can serve as an effective method to identify disease-prone sites and assist to develop therapeutic strategies.
Collapse
Affiliation(s)
- Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai 600036, India.
| |
Collapse
|
6
|
Pan-cancer clinical impact of latent drivers from double mutations. Commun Biol 2023; 6:202. [PMID: 36808143 PMCID: PMC9941481 DOI: 10.1038/s42003-023-04519-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2021] [Accepted: 01/23/2023] [Indexed: 02/22/2023] Open
Abstract
Here, we discover potential 'latent driver' mutations in cancer genomes. Latent drivers have low frequencies and minor observable translational potential. As such, to date they have escaped identification. Their discovery is important, since when paired in cis, latent driver mutations can drive cancer. Our comprehensive statistical analysis of the pan-cancer mutation profiles of ~60,000 tumor sequences from the TCGA and AACR-GENIE cohorts identifies significantly co-occurring potential latent drivers. We observe 155 same gene double mutations of which 140 individual components are cataloged as latent drivers. Evaluation of cell lines and patient-derived xenograft response data to drug treatment indicate that in certain genes double mutations may have a prominent role in increasing oncogenic activity, hence obtaining a better drug response, as in PIK3CA. Taken together, our comprehensive analyses indicate that same-gene double mutations are exceedingly rare phenomena but are a signature for some cancer types, e.g., breast, and lung cancers. The relative rarity of doublets can be explained by the likelihood of strong signals resulting in oncogene-induced senescence, and by doublets consisting of non-identical single residue components populating the background mutational load, thus not identified.
Collapse
|
7
|
Functional and structural analyses of novel Smith-Kingsmore Syndrome-Associated MTOR variants reveal potential new mechanisms and predictors of pathogenicity. PLoS Genet 2021; 17:e1009651. [PMID: 34197453 PMCID: PMC8279410 DOI: 10.1371/journal.pgen.1009651] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 07/14/2021] [Accepted: 06/08/2021] [Indexed: 12/31/2022] Open
Abstract
Smith-Kingsmore syndrome (SKS) is a rare neurodevelopmental disorder characterized by macrocephaly/megalencephaly, developmental delay, intellectual disability, hypotonia, and seizures. It is caused by dominant missense mutations in MTOR. The pathogenicity of novel variants in MTOR in patients with neurodevelopmental disorders can be difficult to determine and the mechanism by which variants cause disease remains poorly understood. We report 7 patients with SKS with 4 novel MTOR variants and describe their phenotypes. We perform in vitro functional analyses to confirm MTOR activation and interrogate disease mechanisms. We complete structural analyses to understand the 3D properties of pathogenic variants. We examine the accuracy of relative accessible surface area, a quantitative measure of amino acid side-chain accessibility, as a predictor of MTOR variant pathogenicity. We describe novel clinical features of patients with SKS. We confirm MTOR Complex 1 activation and identify MTOR Complex 2 activation as a new potential mechanism of disease in SKS. We find that pathogenic MTOR variants disproportionately cluster in hotspots in the core of the protein, where they disrupt alpha helix packing due to the insertion of bulky amino acid side chains. We find that relative accessible surface area is significantly lower for SKS-associated variants compared to benign variants. We expand the phenotype of SKS and demonstrate that additional pathways of activation may contribute to disease. Incorporating 3D properties of MTOR variants may help in pathogenicity classification. We hope these findings may contribute to improving the precision of care and therapeutic development for individuals with SKS. Smith-Kingsmore Syndrome is a rare disease caused by damage in a gene named MTOR that is associated with excessive growth of the head and brain, delays in development and deficits in intellectual functioning. We report 7 patients who have changes in MTOR that have never been reported before. We describe new medical findings in these patients that may be common in Smith-Kingsmore Syndrome more broadly. We then identify how these new gene changes impact the function of the MTOR protein and thus cell function downstream. Lastly, we show that changes in the gene that lie deep inside the 3D structure of the MTOR protein are more likely to cause disease than those changes that lie on the surface of the protein. We may be able to use the 3D properties of MTOR gene changes to predict if future changes we see are likely to cause disease or not.
Collapse
|
8
|
Urbanek-Trzeciak MO, Galka-Marciniak P, Nawrocka PM, Kowal E, Szwec S, Giefing M, Kozlowski P. Pan-cancer analysis of somatic mutations in miRNA genes. EBioMedicine 2020; 61:103051. [PMID: 33038763 PMCID: PMC7648123 DOI: 10.1016/j.ebiom.2020.103051] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 09/16/2020] [Accepted: 09/16/2020] [Indexed: 02/08/2023] Open
Abstract
Background miRNAs are considered important players in oncogenesis, serving either as oncomiRs or suppressormiRs. Although the accumulation of somatic alterations is an intrinsic aspect of cancer development and many important cancer-driving mutations have been identified in protein-coding genes, the area of functional somatic mutations in miRNA genes is heavily understudied. Methods Here, based on the analysis of large genomic datasets, mostly the whole-exome sequencing of over 10,000 cancer/normal sample pairs deposited within the TCGA repository, we undertook an analysis of somatic mutations in miRNA genes. Findings We identified and characterized over 10,000 somatic mutations and showed that some of the miRNA genes are overmutated in Pan-Cancer and/or specific cancers. Nonrandom occurrence of the identified mutations was confirmed by a strong association of overmutated miRNA genes with KEGG pathways, most of which were related to specific cancer types or cancer-related processes. Additionally, we showed that mutations in some of the overmutated genes correlate with miRNA expression, cancer staging, and patient survival. Interpretation Our study is the first comprehensive Pan-Cancer study of cancer somatic mutations in miRNA genes. It may help to understand the consequences of mutations in miRNA genes and the identification of miRNA functional mutations. The results may also be the first step (form the basis and provide the resources) in the development of computational and/or statistical approaches/tools dedicated to the identification of cancer-driver miRNA genes. Funding This work was supported by research grants from the Polish National Science Centre 2016/22/A/NZ2/00184 and 2015/17/N/NZ3/03629.
Collapse
Affiliation(s)
| | | | - Paulina M Nawrocka
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
| | - Ewelina Kowal
- Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
| | - Sylwia Szwec
- Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
| | - Maciej Giefing
- Institute of Human Genetics, Polish Academy of Sciences, Poznan, Poland
| | - Piotr Kozlowski
- Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland.
| |
Collapse
|