1
|
Ozturk K, Panwala R, Sheen J, Ford K, Jayne N, Portell A, Zhang DE, Hutter S, Haferlach T, Ideker T, Mali P, Carter H. Interface-guided phenotyping of coding variants in the transcription factor RUNX1. Cell Rep 2024; 43:114436. [PMID: 38968069 PMCID: PMC11345852 DOI: 10.1016/j.celrep.2024.114436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Revised: 05/15/2024] [Accepted: 06/19/2024] [Indexed: 07/07/2024] Open
Abstract
Single-gene missense mutations remain challenging to interpret. Here, we deploy scalable functional screening by sequencing (SEUSS), a Perturb-seq method, to generate mutations at protein interfaces of RUNX1 and quantify their effect on activities of downstream cellular programs. We evaluate single-cell RNA profiles of 115 mutations in myelogenous leukemia cells and categorize them into three functionally distinct groups, wild-type (WT)-like, loss-of-function (LoF)-like, and hypomorphic, that we validate in orthogonal assays. LoF-like variants dominate the DNA-binding site and are recurrent in cancer; however, recurrence alone does not predict functional impact. Hypomorphic variants share characteristics with LoF-like but favor protein interactions, promoting gene expression indicative of nerve growth factor (NGF) response and cytokine recruitment of neutrophils. Accessible DNA near differentially expressed genes frequently contains RUNX1-binding motifs. Finally, we reclassify 16 variants of uncertain significance and train a classifier to predict 103 more. Our work demonstrates the potential of targeting protein interactions to better define the landscape of phenotypes reachable by missense mutations.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Division of Medical Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA, USA; Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA
| | - Rebecca Panwala
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Jeanna Sheen
- School of Biological Sciences, University of California, San Diego, La Jolla, CA, USA
| | - Kyle Ford
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Nathan Jayne
- School of Biological Sciences, University of California, San Diego, La Jolla, CA, USA; Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Andrew Portell
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA
| | - Dong-Er Zhang
- Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Stephan Hutter
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 Munich, Germany
| | - Torsten Haferlach
- MLL Munich Leukemia Laboratory, Max-Lebsche-Platz 31, 81377 Munich, Germany
| | - Trey Ideker
- Division of Medical Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA, USA; Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA; Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA
| | - Prashant Mali
- Department of Bioengineering, University of California, San Diego, La Jolla, CA, USA.
| | - Hannah Carter
- Division of Medical Genetics, Department of Medicine, University of California, San Diego, La Jolla, CA, USA; Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA, USA; Moores Cancer Center, University of California, San Diego, La Jolla, CA, USA.
| |
Collapse
|
2
|
Pandey M, Shah SK, Gromiha MM. Computational approaches for identifying disease-causing mutations in proteins. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2023; 139:141-171. [PMID: 38448134 DOI: 10.1016/bs.apcsb.2023.11.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/08/2024]
Abstract
Advancements in genome sequencing have expanded the scope of investigating mutations in proteins across different diseases. Amino acid mutations in a protein alter its structure, stability and function and some of them lead to diseases. Identification of disease-causing mutations is a challenging task and it will be helpful for designing therapeutic strategies. Hence, mutation data available in the literature have been curated and stored in several databases, which have been effectively utilized for developing computational methods to identify deleterious mutations (drivers), using sequence and structure-based properties of proteins. In this chapter, we describe the contents of specific databases that have information on disease-causing and neutral mutations followed by sequence and structure-based properties. Further, characteristic features of disease-causing mutations will be discussed along with computational methods for identifying cancer hotspot residues and disease-causing mutations in proteins.
Collapse
Affiliation(s)
- Medha Pandey
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - Suraj Kumar Shah
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, India; International Research Frontiers Initiative, School of Computing, Tokyo Institute of Technology, Yokohama, Japan.
| |
Collapse
|
3
|
Weisman CM. The permissive binding theory of cancer. Front Oncol 2023; 13:1272981. [PMID: 38023252 PMCID: PMC10666763 DOI: 10.3389/fonc.2023.1272981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Accepted: 10/20/2023] [Indexed: 12/01/2023] Open
Abstract
The later stages of cancer, including the invasion and colonization of new tissues, are actively mysterious compared to earlier stages like primary tumor formation. While we lack many details about both, we do have an apparently successful explanatory framework for the earlier stages: one in which genetic mutations hold ultimate causal and explanatory power. By contrast, on both empirical and conceptual grounds, it is not currently clear that mutations alone can explain the later stages of cancer. Can a different type of molecular change do better? Here, I introduce the "permissive binding theory" of cancer, which proposes that novel protein binding interactions are the key causal and explanatory entity in invasion and metastasis. It posits that binding is more abundant at baseline than we observe because it is restricted in normal physiology; that any large perturbation to physiological state revives this baseline abundance, unleashing many new binding interactions; and that a subset of these cause the cellular functions at the heart of oncogenesis, especially invasion and metastasis. Significant physiological perturbations occur in cancer cells in very early stages, and generally become more extreme with progression, providing interactions that continually fuel invasion and metastasis. The theory is compatible with, but not limited to, causal roles for the diverse molecular changes observed in cancer (e.g. gene expression or epigenetic changes), as these generally act causally upstream of proteins, and so may exert their effects by changing the protein binding interactions that occur in the cell. This admits the possibility that molecular changes that appear quite different may actually converge in creating the same few protein complexes, simplifying our picture of invasion and metastasis. If correct, the theory offers a concrete therapeutic strategy: targeting the key novel complexes. The theory is straightforwardly testable by large-scale identification of protein interactions in different cancers.
Collapse
Affiliation(s)
- Caroline M. Weisman
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, United States
| |
Collapse
|
4
|
Hatano N, Kamada M, Kojima R, Okuno Y. Network-based prediction approach for cancer-specific driver missense mutations using a graph neural network. BMC Bioinformatics 2023; 24:383. [PMID: 37817080 PMCID: PMC10565986 DOI: 10.1186/s12859-023-05507-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 10/02/2023] [Indexed: 10/12/2023] Open
Abstract
BACKGROUND In cancer genomic medicine, finding driver mutations involved in cancer development and tumor growth is crucial. Machine-learning methods to predict driver missense mutations have been developed because variants are frequently detected by genomic sequencing. However, even though the abnormalities in molecular networks are associated with cancer, many of these methods focus on individual variants and do not consider molecular networks. Here we propose a new network-based method, Net-DMPred, to predict driver missense mutations considering molecular networks. Net-DMPred consists of the graph part and the prediction part. In the graph part, molecular networks are learned by a graph neural network (GNN). The prediction part learns whether variants are driver variants using features of individual variants combined with the graph features learned in the graph part. RESULTS Net-DMPred, which considers molecular networks, performed better than conventional methods. Furthermore, the prediction performance differed by the molecular network structure used in learning, suggesting that it is important to consider not only the local network related to cancer but also the large-scale network in living organisms. CONCLUSIONS We propose a network-based machine learning method, Net-DMPred, for predicting cancer driver missense mutations. Our method enables us to consider the entire graph architecture representing the molecular network because it uses GNN. Net-DMPred is expected to detect driver mutations from a lot of missense mutations that are not known to be associated with cancer.
Collapse
Affiliation(s)
- Narumi Hatano
- Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Mayumi Kamada
- Graduate School of Medicine, Kyoto University, Kyoto, Japan.
| | - Ryosuke Kojima
- Graduate School of Medicine, Kyoto University, Kyoto, Japan
| | - Yasushi Okuno
- Graduate School of Medicine, Kyoto University, Kyoto, Japan.
- HPC- and AI-driven Drug Development Platform Division, RIKEN Center for Computational Science(R-CCS), Kobe, Japan.
| |
Collapse
|
5
|
Ozturk K, Panwala R, Sheen J, Ford K, Payne N, Zhang DE, Hutter S, Haferlach T, Ideker T, Mali P, Carter H. Interface-guided phenotyping of coding variants in the transcription factor RUNX1 with SEUSS. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.03.551876. [PMID: 37577681 PMCID: PMC10418284 DOI: 10.1101/2023.08.03.551876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Understanding the consequences of single amino acid substitutions in cancer driver genes remains an unmet need. Perturb-seq provides a tool to investigate the effects of individual mutations on cellular programs. Here we deploy SEUSS, a Perturb-seq like approach, to generate and assay mutations at physical interfaces of the RUNX1 Runt domain. We measured the impact of 115 mutations on RNA profiles in single myelogenous leukemia cells and used the profiles to categorize mutations into three functionally distinct groups: wild-type (WT)-like, loss-of-function (LOF)-like and hypomorphic. Notably, the largest concentration of functional mutations (non-WT-like) clustered at the DNA binding site and contained many of the more frequently observed mutations in human cancers. Hypomorphic variants shared characteristics with loss of function variants but had gene expression profiles indicative of response to neural growth factor and cytokine recruitment of neutrophils. Additionally, DNA accessibility changes upon perturbations were enriched for RUNX1 binding motifs, particularly near differentially expressed genes. Overall, our work demonstrates the potential of targeting protein interaction interfaces to better define the landscape of prospective phenotypes reachable by amino acid substitutions.
Collapse
|
6
|
Ozturk K, Carter H. Publisher Correction: Predicting functional consequences of mutations using molecular interaction network features. Hum Genet 2022; 141:1593. [PMID: 36151408 PMCID: PMC9522748 DOI: 10.1007/s00439-022-02492-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
- Kivilcim Ozturk
- Division of Medical Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
| | - Hannah Carter
- Division of Medical Genetics, Department of Medicine, University of California San Diego, La Jolla, CA, USA.
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA.
- Moores Cancer Center, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
7
|
Computational interpretation of human genetic variation. Hum Genet 2022; 141:1545-1548. [PMID: 36149496 DOI: 10.1007/s00439-022-02483-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/04/2022]
|
8
|
Capriotti E, Fariselli P. Evaluating the relevance of sequence conservation in the prediction of pathogenic missense variants. Hum Genet 2022; 141:1649-1658. [DOI: 10.1007/s00439-021-02419-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 12/12/2021] [Indexed: 12/28/2022]
|
9
|
Arani AA, Sehhati M, Tabatabaiefar MA. Predicting deleterious missense genetic variants via integrative supervised nonnegative matrix tri-factorization. Sci Rep 2021; 11:23747. [PMID: 34887492 PMCID: PMC8660898 DOI: 10.1038/s41598-021-03230-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 11/30/2021] [Indexed: 11/21/2022] Open
Abstract
Among an assortment of genetic variations, Missense are major ones which a small subset of them may led to the upset of the protein function and ultimately end in human diseases. Various machine learning methods were declared to differentiate deleterious and benign missense variants by means of a large number of features, including structure, sequence, interaction networks, gene disease associations as well as phenotypes. However, development of a reliable and accurate algorithm for merging heterogeneous information is highly needed as it could be captured all information of complex interactions on network that genes participate in. In this study we proposed a new method based on the non-negative matrix tri-factorization clustering method. We outlined two versions of the proposed method: two-source and three-source algorithms. Two-source algorithm aggregates individual deleteriousness prediction methods and PPI network, and three-source algorithm incorporates gene disease associations into the other sources already mentioned. Four benchmark datasets were employed for internally and externally validation of both algorithms of our predictor. The results at all datasets confirmed that, our method outperforms most state of the art variant prediction tools. Two key features of our variant effect prediction method are worth mentioning. Firstly, despite the fact that the incorporation of gene disease information at three-source algorithm can improve prediction performance by comparison with two-source algorithm, our method did not hinder by type 2 circularity error unlike some recent ensemble-based prediction methods. Type 2 circularity error occurs when the predictor annotates variants on the basis of the genes located on. Secondly, the performance of our predictor is superior over other ensemble-based methods for variants positioned on genes in which we do not have enough information about their pathogenicity.
Collapse
Affiliation(s)
- Asieh Amousoltani Arani
- Department of Bioelectric and Biomedical Engineering, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
- Student Research Committee, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mohammadreza Sehhati
- Department of Bioinformatics, School of Advanced Technologies in Medicine, Isfahan University of Medical Sciences, Isfahan, Iran.
- Deputy of Research and Technology, GTaC Corp, Isfahan University of Medical Sciences, Isfahan, Iran.
| | - Mohammad Amin Tabatabaiefar
- Deputy of Research and Technology, GTaC Corp, Isfahan University of Medical Sciences, Isfahan, Iran
- Department of Genetics and Molecular Biology, School of Medicine, Isfahan University of Medical Sciences, Isfahan, Iran
| |
Collapse
|