1
|
Bernstein N, Spencer Chapman M, Nyamondo K, Chen Z, Williams N, Mitchell E, Campbell PJ, Cohen RL, Nangalia J. Analysis of somatic mutations in whole blood from 200,618 individuals identifies pervasive positive selection and novel drivers of clonal hematopoiesis. Nat Genet 2024; 56:1147-1155. [PMID: 38744975 PMCID: PMC11176083 DOI: 10.1038/s41588-024-01755-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 04/17/2024] [Indexed: 05/16/2024]
Abstract
Human aging is marked by the emergence of a tapestry of clonal expansions in dividing tissues, particularly evident in blood as clonal hematopoiesis (CH). CH, linked to cancer risk and aging-related phenotypes, often stems from somatic mutations in a set of established genes. However, the majority of clones lack known drivers. Here we infer gene-level positive selection in whole blood exomes from 200,618 individuals in UK Biobank. We identify 17 additional genes, ZBTB33, ZNF318, ZNF234, SPRED2, SH2B3, SRCAP, SIK3, SRSF1, CHEK2, CCDC115, CCL22, BAX, YLPM1, MYD88, MTA2, MAGEC3 and IGLL5, under positive selection at a population level, and validate this selection pattern in 10,837 whole genomes from single-cell-derived hematopoietic colonies. Clones with mutations in these genes grow in frequency and size with age, comparable to classical CH drivers. They correlate with heightened risk of infection, death and hematological malignancy, highlighting the significance of these additional genes in the aging process.
Collapse
Affiliation(s)
| | - Michael Spencer Chapman
- Wellcome Sanger Institute, Hinxton, UK
- Wellcome-MRC Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | - Kudzai Nyamondo
- Wellcome Sanger Institute, Hinxton, UK
- Wellcome-MRC Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | - Zhenghao Chen
- Calico Life Sciences LLC, South San Francisco, CA, USA
| | | | - Emily Mitchell
- Wellcome Sanger Institute, Hinxton, UK
- Wellcome-MRC Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK
| | | | | | - Jyoti Nangalia
- Wellcome Sanger Institute, Hinxton, UK.
- Wellcome-MRC Cambridge Stem Cell Institute, Jeffrey Cheah Biomedical Centre, University of Cambridge, Cambridge, UK.
| |
Collapse
|
2
|
Zhang J, Croft J, Le A. Familial CCM Genes Might Not Be Main Drivers for Pathogenesis of Sporadic CCMs-Genetic Similarity between Cancers and Vascular Malformations. J Pers Med 2023; 13:jpm13040673. [PMID: 37109059 PMCID: PMC10143507 DOI: 10.3390/jpm13040673] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 04/05/2023] [Accepted: 04/15/2023] [Indexed: 04/29/2023] Open
Abstract
Cerebral cavernous malformations (CCMs) are abnormally dilated intracranial capillaries that form cerebrovascular lesions with a high risk of hemorrhagic stroke. Recently, several somatic "activating" gain-of-function (GOF) point mutations in PIK3CA (phosphatidylinositol-4, 5-bisphosphate 3-kinase catalytic subunit p110α) were discovered as a dominant mutation in the lesions of sporadic forms of cerebral cavernous malformation (sCCM), raising the possibility that CCMs, like other types of vascular malformations, fall in the PIK3CA-related overgrowth spectrum (PROS). However, this possibility has been challenged with different interpretations. In this review, we will continue our efforts to expound the phenomenon of the coexistence of gain-of-function (GOF) point mutations in the PIK3CA gene and loss-of-function (LOF) mutations in CCM genes in the CCM lesions of sCCM and try to delineate the relationship between mutagenic events with CCM lesions in a temporospatial manner. Since GOF PIK3CA point mutations have been well studied in reproductive cancers, especially breast cancer as a driver oncogene, we will perform a comparative meta-analysis for GOF PIK3CA point mutations in an attempt to demonstrate the genetic similarities shared by both cancers and vascular anomalies.
Collapse
Affiliation(s)
- Jun Zhang
- Departments of Molecular & Translational Medicine (MTM), Texas Tech University Health Science Center El Paso (TTUHSCEP), El Paso, TX 79905, USA
| | - Jacob Croft
- Departments of Molecular & Translational Medicine (MTM), Texas Tech University Health Science Center El Paso (TTUHSCEP), El Paso, TX 79905, USA
| | - Alexander Le
- Departments of Molecular & Translational Medicine (MTM), Texas Tech University Health Science Center El Paso (TTUHSCEP), El Paso, TX 79905, USA
| |
Collapse
|
3
|
Chen H, Peng F, Xu J, Wang G, Zhao Y. Increased expression of GPX4 promotes the tumorigenesis of thyroid cancer by inhibiting ferroptosis and predicts poor clinical outcomes. Aging (Albany NY) 2023; 15:230-245. [PMID: 36626251 PMCID: PMC9876627 DOI: 10.18632/aging.204473] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 12/16/2022] [Indexed: 01/11/2023]
Abstract
BACKGROUND Ferroptosis plays a critical role in suppressing cancer progression, and its essential regulator is glutathione peroxidase 4 (GPX4). High GPX4 expression can inhibit accumulation of iron, thus suppressing ferroptosis. However, its function in thyroid cancer has not been fully illuminated. Here, we explore the effect of GPX4 on thyroid cancer tumorigenesis and prognosis. METHODS Based on The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases, GPX4 expression was investigated in cancer tissues and adjacent tissues. We determined the biological functions of GPX4-associated differentially expressed genes (DEGs) by using the "clusterProfiler" R package. In addition, the predictive value of GPX4 in thyroid cancer was assessed by using Cox regression analysis and nomograms. Finally, we conducted several in vitro experiments to determine the influence of GPX4 expression on proliferation and ferroptosis in thyroid cancer cells. RESULTS GPX4 expression was obviously elevated in thyroid cancer tissues compared with normal tissues. Biological function analysis indicated enrichment in muscle contraction, contractile fiber, metal ion transmembrane transporter activity, and complement and coagulation cascades. GPX4 overexpression was associated with stage T3-T4 and pathologic stage III-IV in thyroid cancer patients. Cox regression analysis indicated that GPX4 may be a risk factor for the overall survival of thyroid cancer patients. In vitro research showed that knockdown of GPX4 suppressed proliferation and induced ferroptosis in thyroid cancer cells. CONCLUSIONS GPX4 overexpression in thyroid cancer might play an essential role in tumorigenesis and may have prognostic value for thyroid cancer patients.
Collapse
Affiliation(s)
- Huanjie Chen
- Department of General Surgery, The Second Hospital of Dalian Medical University, Dalian, Liaoning Province, People’s Republic of China
| | - Fang Peng
- Department of Pathology, The Second Hospital of Dalian Medical University, Dalian, Liaoning Province, People’s Republic of China
| | - Jingchao Xu
- Department of General Surgery, The Second Hospital of Dalian Medical University, Dalian, Liaoning Province, People’s Republic of China
| | - Guangzhi Wang
- Department of General Surgery, The Second Hospital of Dalian Medical University, Dalian, Liaoning Province, People’s Republic of China
| | - Yongfu Zhao
- Department of General Surgery, The Second Hospital of Dalian Medical University, Dalian, Liaoning Province, People’s Republic of China
| |
Collapse
|
4
|
Siddique A, Bashir S, Abbas M. Pharmacogenetics of Anticancer Drugs: Clinical Response and Toxicity. Cancer Treat Res 2023; 185:141-175. [PMID: 37306909 DOI: 10.1007/978-3-031-27156-4_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Cancer is the most challenging disease for medical professionals to treat. The factors underlying the complicated situation include anticancer drug-associated toxicity, non-specific response, low therapeutic window, variable treatment outcomes, development of drug resistance, treatment complications, and cancer recurrence. The remarkable advancement in biomedical sciences and genetics, over the past few decades, however, is changing the dire situation. The discovery of gene polymorphism, gene expression, biomarkers, particular molecular targets and pathways, and drug-metabolizing enzymes have paved the way for the development and provision of targeted and individualized anticancer treatment. Pharmacogenetics is the study of genetic factors having the potential to affect clinical responses and pharmacokinetic and pharmacodynamic behaviors of drugs. This chapter emphasizes pharmacogenetics of anticancer drugs and its applications in improving treatment outcomes, selectivity, toxicity of the drugs, and discovering and developing personalized anticancer drugs and genetic methods for prediction of drug response and toxicity.
Collapse
Affiliation(s)
- Ammara Siddique
- Faculty of Pharmacy, Bahauddin Zakariya University, Multan, Pakistan
| | - Samra Bashir
- Faculty of Pharmacy, Capital University of Science and Technology, Islamabad, Pakistan.
| | - Mateen Abbas
- Faculty of Pharmacy, Capital University of Science and Technology, Islamabad, Pakistan
| |
Collapse
|
5
|
He Z, Lin Y, Wei R, Liu C, Jiang D. Repulsion and attraction in searching: A hybrid algorithm based on gravitational kernel and vital few for cancer driver gene prediction. Comput Biol Med 2022; 151:106236. [PMID: 36370584 DOI: 10.1016/j.compbiomed.2022.106236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2022] [Revised: 10/15/2022] [Accepted: 10/22/2022] [Indexed: 12/27/2022]
Abstract
By taking a new perspective to combine a machine learning method with an evolutionary algorithm, a new hybrid algorithm is developed to predict cancer driver genes. Firstly, inspired by the search strategy with the capability of global search in evolutionary algorithms, a gravitational kernel is proposed to act on the full range of gene features. Constructed by fusing PPI and mutation features, the gravitational kernel is capable to produce repulsion effects. The candidate genes with greater mutation effects and PPI have higher similarity scores. According to repulsion, the similarity score of these promising genes is larger than ordinary genes, which is beneficial to search for these promising genes. Secondly, inspired by the idea of elite populations related to evolutionary algorithms, the concept of vital few is proposed. Targeted at a local scale, it acts on the candidate genes associated with vital few genes. Under attraction effect, these vital few driver genes attract those with similar mutational effects to them, which leads to greater similarity scores. Lastly, the model and parameters are optimized by using an evolutionary algorithm, so as to obtain the optimal model and parameters for cancer driver gene prediction. Herein, a comparison is performed with six other advanced methods of cancer driver gene prediction. According to the experimental results, the method proposed in this study outperforms these six state-of-the-art algorithms on the pan-oncogene dataset.
Collapse
Affiliation(s)
- Zhihui He
- Department of Computer Science, Shantou University, 515063, China
| | - Yingqing Lin
- Department of Computer Science, Shantou University, 515063, China
| | - Runguo Wei
- Department of Computer Science, Shantou University, 515063, China
| | - Cheng Liu
- Department of Computer Science, Shantou University, 515063, China
| | - Dazhi Jiang
- Department of Computer Science, Shantou University, 515063, China; Guangdong Provincial Key Laboratory of Information Security Technology, Sun Yat-sen University, Guangzhou 510399, China.
| |
Collapse
|
6
|
Shen X, Mo X, Tan W, Mo X, Li L, Yu F, He J, Deng Z, Xing S, Chen Z, Yang J. KIAA1199 Correlates With Tumor Microenvironment and Immune Infiltration in Lung Adenocarcinoma as a Potential Prognostic Biomarker. Pathol Oncol Res 2022; 28:1610754. [PMID: 36419650 PMCID: PMC9676226 DOI: 10.3389/pore.2022.1610754] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 10/25/2022] [Indexed: 09/05/2023]
Abstract
Background: KIAA1199 has been considered a key regulator of carcinogenesis. However, the relationship between KIAA1199 and immune infiltrates, as well as its prognostic value in lung adenocarcinoma (LUAD) remains unclear. Methods: The expression of KIAA1199 and its influence on tumor prognosis were analyzed using a series of databases, comprising TIMER, GEPIA, UALCAN, LCE, Prognoscan and Kaplan-Meier Plotter. Further, immunohistochemistry (IHC), western blot (WB) and receiver operating characteristic (ROC) curve analyses were performed to verify our findings. The cBioPortal was used to investigate the genomic alterations of KIAA1199. Prediction of candidate microRNA (miRNAs) and transcription factor (TF) targeting KIAA1199, as well as GO and KEGG analyses, were performed based on LinkedOmics. TIMER and TISIDB databases were used to explore the relationship between KIAA1199 and tumor immune infiltration. Results: High expression of KIAA1199 was identified in LUAD and Lung squamous cell carcinoma (LUSC) patients. High expression of KIAA1199 indicated a worse prognosis in LUAD patients. The results of IHC and WB analyses showed that the expression level of KIAA1199 in tumor tissues was higher than that in adjacent tissues. GO and KEGG analyses indicated KIAA1199 was mainly involved in extracellular matrix (ECM)-receptor interaction and extracellular matrix structure constituent. KIAA1199 was positively correlated with infiltrating levels of CD4+ T cells, macrophages, neutrophil cells, dendritic cells, and showed positive relationship with immune marker subsets expression of a variety of immunosuppressive cells. Conclusion: High expression of KIAA1199 predicts a poor prognosis of LUAD patients. KIAA1199 might exert its carcinogenic role in the tumor microenvironment via participating in the extracellular matrix formation and regulating the infiltration of immune cells in LUAD. The results indicate that KIAA1199 might be a novel biomarker for evaluating prognosis and immune cell infiltration in LUAD.
Collapse
Affiliation(s)
- Xiaoju Shen
- Department of Pharmacology, School of Pharmacy, Guangxi Medical University, Nanning, China
- Department of Pharmacy, The First Affiliated Hospital of Guangxi Medical University, Nanning, China
| | - Xiaocheng Mo
- Department of Pharmacology, School of Pharmacy, Guangxi Medical University, Nanning, China
| | - Weidan Tan
- Department of Pharmacology, School of Pharmacy, Guangxi Medical University, Nanning, China
| | - Xiaoxiang Mo
- Department of Pharmacology, School of Pharmacy, Guangxi Medical University, Nanning, China
| | - Li Li
- Department of Pharmacology, Guangxi Institute of Chinese Medicine and Pharmaceutical Science, Nanning, China
| | - Fei Yu
- Department of Pharmacology, School of Pharmacy, Guangxi Medical University, Nanning, China
| | - Jingchuan He
- Department of Pharmacology, School of Pharmacy, Guangxi Medical University, Nanning, China
| | - Zhihua Deng
- Department of Pharmacology, School of Pharmacy, Guangxi Medical University, Nanning, China
| | - Shangping Xing
- Guangxi Key Laboratory of Bioactive Molecules Research and Evaluation, School of Pharmacy, Guangxi Medical University, Nanning, China
| | - Zhiquan Chen
- Department of Pharmacology, School of Pharmacy, Guangxi Medical University, Nanning, China
| | - Jie Yang
- Department of Pharmacology, School of Pharmacy, Guangxi Medical University, Nanning, China
| |
Collapse
|
7
|
Alfonsi T, Bernasconi A, Canakoglu A, Masseroli M. Genomic data integration and user-defined sample-set extraction for population variant analysis. BMC Bioinformatics 2022; 23:401. [PMID: 36175857 PMCID: PMC9520931 DOI: 10.1186/s12859-022-04927-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 09/13/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Population variant analysis is of great importance for gathering insights into the links between human genotype and phenotype. The 1000 Genomes Project established a valuable reference for human genetic variation; however, the integrative use of the corresponding data with other datasets within existing repositories and pipelines is not fully supported. Particularly, there is a pressing need for flexible and fast selection of population partitions based on their variant and metadata-related characteristics. RESULTS Here, we target general germline or somatic mutation data sources for their seamless inclusion within an interoperable-format repository, supporting integration among them and with other genomic data, as well as their integrated use within bioinformatic workflows. In addition, we provide VarSum, a data summarization service working on sub-populations of interest selected using filters on population metadata and/or variant characteristics. The service is developed as an optimized computational framework with an Application Programming Interface (API) that can be called from within any existing computing pipeline or programming script. Provided example use cases of biological interest show the relevance, power and ease of use of the API functionalities. CONCLUSIONS The proposed data integration pipeline and data set extraction and summarization API pave the way for solid computational infrastructures that quickly process cumbersome variation data, and allow biologists and bioinformaticians to easily perform scalable analysis on user-defined partitions of large cohorts from increasingly available genetic variation studies. With the current tendency to large (cross)nation-wide sequencing and variation initiatives, we expect an ever growing need for the kind of computational support hereby proposed.
Collapse
Affiliation(s)
- Tommaso Alfonsi
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy.
| | - Anna Bernasconi
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy
| | - Arif Canakoglu
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy.,Dipartimento di Anestesia, Rianimazione ed Emergenza-Urgenza, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, Policlinico di Milano, Via Francesco Sforza, 35, 20122, Milan, Italy
| | - Marco Masseroli
- Department of Electronics, Information and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, 20133, Milan, Italy
| |
Collapse
|
8
|
Bao G, Guan X, Liang J, Yao Y, Xiang Y, Li T, Zhong X. A Germline Mutation in ATR Is Associated With Lung Adenocarcinoma in Asian Patients. Front Oncol 2022; 12:855305. [PMID: 35712480 PMCID: PMC9195140 DOI: 10.3389/fonc.2022.855305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Accepted: 04/26/2022] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND Familial lung cancer (FLC) accounts for 8% of lung adenocarcinoma. It is known that a few germline mutations are associated with risk increasing and may provide new screening and treatment option. The goal of this study is to identify an FLC gene among three members of an FLC family. METHODS To uncover somatic and embryonic mutations linked with familial lung cancer, whole exome sequencing was done on surgical tissues and peripheral blood from three sisters in a family diagnosed with pulmonary lung adenocarcinoma (LUAD). At the same time, single-cell RNA sequencing (scRNA-seq) and bulk RNA sequencing data in public databases were enrolled to identify specific gene expression level. RESULTS Ataxia Telangiectasia and Rad3-Related Protein (ATR) gene C.7667C >G (p.T2556S) mutation were found in 3 patients with familial lung cancer. Whole-genome sequencing revealed that the three sisters exhibited similar somatic mutation patterns. Besides ATR mutations, common mutated genes (BRCA1, EGFR, and ROS1) that characterize LUAD were also found in 5 tumor samples. Analysis for the ATR expression in LUAD patients by single-cell sequencing data, we found ATR expression of tumor patients at high level in immune cells when compared with normal patients, but the expression of ATR in stromal cells has the opposite result. CONCLUSION We found a germline mutation in the ATR gene in three sisters of a Chinese family affected by familial lung cancer, which may be a genetic factor for lung cancer susceptibility.
Collapse
Affiliation(s)
- Guangyao Bao
- Department of Thoracic Surgery, First Affiliated Hospital, China Medical University, Shenyang, China
| | - Xiaojiao Guan
- Department of Pathology, Shengjing Hospital, China Medical University, Shenyang, China
| | - Jie Liang
- Department of Thoracic Surgery, First Affiliated Hospital, China Medical University, Shenyang, China
| | - Yao Yao
- Department of Thoracic Surgery, First Affiliated Hospital, China Medical University, Shenyang, China
| | - Yifan Xiang
- Department of Thoracic Surgery, First Affiliated Hospital, China Medical University, Shenyang, China
| | - Tian Li
- School of Basic Medicine, Fourth Military Medical University, Xi’an, China
| | - Xinwen Zhong
- Department of Thoracic Surgery, First Affiliated Hospital, China Medical University, Shenyang, China
| |
Collapse
|
9
|
Chen Z, Lu Y, Cao B, Zhang W, Edwards A, Zhang K. Driver gene detection through Bayesian network integration of mutation and expression profiles. Bioinformatics 2022; 38:2781-2790. [PMID: 35561191 PMCID: PMC9113331 DOI: 10.1093/bioinformatics/btac203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Revised: 03/12/2022] [Accepted: 04/06/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION The identification of mutated driver genes and the corresponding pathways is one of the primary goals in understanding tumorigenesis at the patient level. Integration of multi-dimensional genomic data from existing repositories, e.g., The Cancer Genome Atlas (TCGA), offers an effective way to tackle this issue. In this study, we aimed to leverage the complementary genomic information of individuals and create an integrative framework to identify cancer-related driver genes. Specifically, based on pinpointed differentially expressed genes, variants in somatic mutations and a gene interaction network, we proposed an unsupervised Bayesian network integration (BNI) method to detect driver genes and estimate the disease propagation at the patient and/or cohort levels. This new method first captures inherent structural information to construct a functional gene mutation network and then extracts the driver genes and their controlled downstream modules using the minimum cover subset method. RESULTS Using other credible sources (e.g. Cancer Gene Census and Network of Cancer Genes), we validated the driver genes predicted by the BNI method in three TCGA pan-cancer cohorts. The proposed method provides an effective approach to address tumor heterogeneity faced by personalized medicine. The pinpointed drivers warrant further wet laboratory validation. AVAILABILITY AND IMPLEMENTATION The supplementary tables and source code can be obtained from https://xavieruniversityoflouisiana.sharefile.com/d-se6df2c8d0ebe4800a3030311efddafe5. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhong Chen
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
- Bioinformatics Core of Xavier RCMI Center for Cancer Research, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - You Lu
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
- Bioinformatics Core of Xavier RCMI Center for Cancer Research, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - Bo Cao
- Division of Basic and Pharmaceutical Sciences, College of Pharmacy, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - Wensheng Zhang
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
- Bioinformatics Core of Xavier RCMI Center for Cancer Research, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - Andrea Edwards
- Department of Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
| | - Kun Zhang
- To whom correspondence should be addressed
| |
Collapse
|
10
|
Ko S, Li GX, Choi H, Won JH. Computationally scalable regression modeling for ultrahigh-dimensional omics data with ParProx. Brief Bioinform 2021; 22:bbab256. [PMID: 34254998 PMCID: PMC8575036 DOI: 10.1093/bib/bbab256] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 06/15/2021] [Accepted: 06/17/2021] [Indexed: 12/20/2022] Open
Abstract
Statistical analysis of ultrahigh-dimensional omics scale data has long depended on univariate hypothesis testing. With growing data features and samples, the obvious next step is to establish multivariable association analysis as a routine method to describe genotype-phenotype association. Here we present ParProx, a state-of-the-art implementation to optimize overlapping and non-overlapping group lasso regression models for time-to-event and classification analysis, with selection of variables grouped by biological priors. ParProx enables multivariable model fitting for ultrahigh-dimensional data within an architecture for parallel or distributed computing via latent variable group representation. It thereby aims to produce interpretable regression models consistent with known biological relationships among independent variables, a property often explored post hoc, not during model estimation. Simulation studies clearly demonstrate the scalability of ParProx with graphics processing units in comparison to existing implementations. We illustrate the tool using three different omics data sets featuring moderate to large numbers of variables, where we use genomic regions and biological pathways as variable groups, rendering the selected independent variables directly interpretable with respect to those groups. ParProx is applicable to a wide range of studies using ultrahigh-dimensional omics data, from genome-wide association analysis to multi-omics studies where model estimation is computationally intractable with existing implementation.
Collapse
Affiliation(s)
- Seyoon Ko
- Department of Statistics, Seoul National University, Republic of Korea
| | - Ginny X Li
- Department of Medicine, National University of Singapore, Singapore
| | - Hyungwon Choi
- Department of Medicine, National University of Singapore, Singapore
| | - Joong-Ho Won
- Department of Statistics, Seoul National University, Republic of Korea
| |
Collapse
|
11
|
Magraner-Pardo L, Laskowski RA, Pons T, Thornton JM. A computational and structural analysis of germline and somatic variants affecting the DDR mechanism, and their impact on human diseases. Sci Rep 2021; 11:14268. [PMID: 34253785 PMCID: PMC8275599 DOI: 10.1038/s41598-021-93715-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 06/22/2021] [Indexed: 12/02/2022] Open
Abstract
DNA-Damage Response (DDR) proteins are crucial for maintaining the integrity of the genome by identifying and repairing errors in DNA. Variants affecting their function can have severe consequences since failure to repair damaged DNA can result in cells turning cancerous. Here, we compare germline and somatic variants in DDR genes, specifically looking at their locations in the corresponding three-dimensional (3D) structures, Pfam domains, and protein–protein interaction interfaces. We show that somatic variants in metastatic cases are more likely to be found in Pfam domains and protein interaction interfaces than are pathogenic germline variants or variants of unknown significance (VUS). We also show that there are hotspots in the structures of ATM and BRCA2 proteins where pathogenic germline, and recurrent somatic variants from primary and metastatic tumours, cluster together in 3D. Moreover, in the ATM, BRCA1 and BRCA2 genes from prostate cancer patients, the distributions of germline benign, pathogenic, VUS, and recurrent somatic variants differ across Pfam domains. Together, these results provide a better characterisation of the most recurrent affected regions in DDRs and could help in the understanding of individual susceptibility to tumour development.
Collapse
Affiliation(s)
- Lorena Magraner-Pardo
- Prostate Cancer Clinical Unit, Spanish National Cancer Research Center (CNIO), Madrid, Spain.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Roman A Laskowski
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Tirso Pons
- Department of Immunology and Oncology, National Center for Biotechnology, Spanish National Research Council (CNB-CSIC), Madrid, Spain
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
| |
Collapse
|
12
|
Kobren SN, Chazelle B, Singh M. PertInInt: An Integrative, Analytical Approach to Rapidly Uncover Cancer Driver Genes with Perturbed Interactions and Functionalities. Cell Syst 2020; 11:63-74.e7. [PMID: 32711844 PMCID: PMC7493809 DOI: 10.1016/j.cels.2020.06.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2019] [Revised: 02/23/2020] [Accepted: 06/05/2020] [Indexed: 12/12/2022]
Abstract
A major challenge in cancer genomics is to identify genes with functional roles in cancer and uncover their mechanisms of action. We introduce an integrative framework that identifies cancer-relevant genes by pinpointing those whose interaction or other functional sites are enriched in somatic mutations across tumors. We derive analytical calculations that enable us to avoid time-prohibitive permutation-based significance tests, making it computationally feasible to simultaneously consider multiple measures of protein site functionality. Our accompanying software, PertInInt, combines knowledge about sites participating in interactions with DNA, RNA, peptides, ions, or small molecules with domain, evolutionary conservation, and gene-level mutation data. When applied to 10,037 tumor samples, PertInInt uncovers both known and newly predicted cancer genes, while additionally revealing what types of interactions or other functionalities are disrupted. PertInInt’s analysis demonstrates that somatic mutations are frequently enriched in interaction sites and domains and implicates interaction perturbation as a pervasive cancer-driving event. A fast, analytical framework called PertInInt enables efficient integration of multiple measures of protein site functionality—including interaction, domain, and evolutionary conservation—with gene-level mutation data in order to rapidly detect cancer driver genes along with their disrupted functionalities.
Collapse
Affiliation(s)
- Shilpa Nadimpalli Kobren
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA; Department of Computer Science, Princeton University, Princeton, NJ, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Bernard Chazelle
- Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
| |
Collapse
|
13
|
Hristov BH, Chazelle B, Singh M. uKIN Combines New and Prior Information with Guided Network Propagation to Accurately Identify Disease Genes. Cell Syst 2020; 10:470-479.e3. [PMID: 32684276 PMCID: PMC7821437 DOI: 10.1016/j.cels.2020.05.008] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 04/24/2020] [Accepted: 05/19/2020] [Indexed: 12/23/2022]
Abstract
Protein interaction networks provide a powerful framework for identifying genes causal for complex genetic diseases. Here, we introduce a general framework, uKIN, that uses prior knowledge of disease-associated genes to guide, within known protein-protein interaction networks, random walks that are initiated from newly identified candidate genes. In large-scale testing across 24 cancer types, we demonstrate that our network propagation approach for integrating both prior and new information not only better identifies cancer driver genes than using either source of information alone but also readily outperforms other state-of-the-art network-based approaches. We also apply our approach to genome-wide association data to identify genes functionally relevant for several complex diseases. Overall, our work suggests that guided network propagation approaches that utilize both prior and new data are a powerful means to identify disease genes. uKIN is freely available for download at: https://github.com/Singh-Lab/uKIN.
Collapse
Affiliation(s)
- Borislav H Hristov
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Bernard Chazelle
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|
14
|
Przytycki PF, Singh M. Differential Allele-Specific Expression Uncovers Breast Cancer Genes Dysregulated by Cis Noncoding Mutations. Cell Syst 2020; 10:193-203.e4. [PMID: 32078798 PMCID: PMC7457951 DOI: 10.1016/j.cels.2020.01.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2019] [Revised: 12/04/2019] [Accepted: 01/22/2020] [Indexed: 01/23/2023]
Abstract
Identifying cancer-relevant mutations in noncoding regions is challenging due to the large numbers of such mutations, their low levels of recurrence, and difficulties in interpreting their functional impact. To uncover genes that are dysregulated due to somatic mutations in cis, we build upon the concept of differential allele-specific expression (ASE) and introduce methods to identify genes within an individual's cancer whose ASE differs from what is found in matched normal tissue. When applied to breast cancer tumor samples, our methods detect the known allele-specific effects of copy number variation and nonsense-mediated decay. Further, genes that are found to recurrently exhibit differential ASE across samples are cancer relevant. Genes with cis mutations are enriched for differential ASE, and we find 147 potentially functional noncoding mutations cis to genes that exhibit significant differential ASE. We conclude that differential ASE is a promising means for discovering gene dysregulation due to cis noncoding mutations.
Collapse
Affiliation(s)
- Pawel F Przytycki
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|
15
|
Brandes N, Linial N, Linial M. Quantifying gene selection in cancer through protein functional alteration bias. Nucleic Acids Res 2020; 47:6642-6655. [PMID: 31334812 PMCID: PMC6649814 DOI: 10.1093/nar/gkz546] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Revised: 06/03/2019] [Accepted: 06/16/2019] [Indexed: 11/14/2022] Open
Abstract
Compiling the catalogue of genes actively involved in cancer is an ongoing endeavor, with profound implications to the understanding and treatment of the disease. An abundance of computational methods have been developed to screening the genome for candidate driver genes based on genomic data of somatic mutations in tumors. Existing methods make many implicit and explicit assumptions about the distribution of random mutations. We present FABRIC, a new framework for quantifying the selection of genes in cancer by assessing the effects of de-novo somatic mutations on protein-coding genes. Using a machine-learning model, we quantified the functional effects of ∼3M somatic mutations extracted from over 10 000 human cancerous samples, and compared them against the effects of all possible single-nucleotide mutations in the coding human genome. We detected 593 protein-coding genes showing statistically significant bias towards harmful mutations. These genes, discovered without any prior knowledge, show an overwhelming overlap with known cancer genes, but also include many overlooked genes. FABRIC is designed to avoid false discoveries by comparing each gene to its own background model using rigorous statistics, making minimal assumptions about the distribution of random somatic mutations. The framework is an open-source project with a simple command-line interface.
Collapse
Affiliation(s)
- Nadav Brandes
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel
| | - Nathan Linial
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel
| | - Michal Linial
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Israel
| |
Collapse
|
16
|
Collier O, Stoven V, Vert JP. LOTUS: A single- and multitask machine learning algorithm for the prediction of cancer driver genes. PLoS Comput Biol 2019; 15:e1007381. [PMID: 31568528 PMCID: PMC6786659 DOI: 10.1371/journal.pcbi.1007381] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2018] [Revised: 10/10/2019] [Accepted: 09/04/2019] [Indexed: 12/16/2022] Open
Abstract
Cancer driver genes, i.e., oncogenes and tumor suppressor genes, are involved in the acquisition of important functions in tumors, providing a selective growth advantage, allowing uncontrolled proliferation and avoiding apoptosis. It is therefore important to identify these driver genes, both for the fundamental understanding of cancer and to help finding new therapeutic targets or biomarkers. Although the most frequently mutated driver genes have been identified, it is believed that many more remain to be discovered, particularly for driver genes specific to some cancer types. In this paper, we propose a new computational method called LOTUS to predict new driver genes. LOTUS is a machine-learning based approach which allows to integrate various types of data in a versatile manner, including information about gene mutations and protein-protein interactions. In addition, LOTUS can predict cancer driver genes in a pan-cancer setting as well as for specific cancer types, using a multitask learning strategy to share information across cancer types. We empirically show that LOTUS outperforms five other state-of-the-art driver gene prediction methods, both in terms of intrinsic consistency and prediction accuracy, and provide predictions of new cancer genes across many cancer types. Cancer development is driven by mutations and dysfunction of important, so-called cancer driver genes, that could be targeted by specific therapies. While a number of such cancer genes have already been identified, it is believed that many more remain to be discovered. To help prioritize experimental investigations of candidate genes, several computational methods have been proposed to rank promising candidates based on their mutations in large cohorts of cancer cases, or on their interactions with known driver genes in biological networks. We propose LOTUS, a new computational approach to identify genes with high oncogenic potential. LOTUS implements a machine learning approach to learn an oncogenic potential score from known driver genes, and brings two novelties compared to existing methods. First, it allows to easily combine heterogeneous sources of information into the scoring function, which we illustrate by learning a scoring function from both known mutations in large cancer cohorts and interactions in biological networks. Second, using a multitask learning strategy, it can predict different driver genes for different cancer types, while sharing information between them to improve the prediction for every type. We provide experimental results showing that LOTUS significantly outperforms several state-of-the-art cancer gene prediction software.
Collapse
Affiliation(s)
- Olivier Collier
- Modal’X, UPL, Univ Paris Nanterre, F-92000 Nanterre, France
- * E-mail: (OC); (J-PV)
| | - Véronique Stoven
- MINES ParisTech, PSL University, CBIO-Centre for Computational Biology, F-75006 Paris, France
- Institut Curie, F-75248 Paris Cedex 5, France
- INSERM U900, F-75248 Paris Cedex 5, France
| | - Jean-Philippe Vert
- MINES ParisTech, PSL University, CBIO-Centre for Computational Biology, F-75006 Paris, France
- Google Research, Brain team, F-75009 Paris, France
- * E-mail: (OC); (J-PV)
| |
Collapse
|
17
|
Przytycki PF, Singh M. Correction to: Differential analysis between somatic mutation and germline variation profiles reveals cancer-related genes. Genome Med 2018; 10:35. [PMID: 29747675 PMCID: PMC5946516 DOI: 10.1186/s13073-018-0544-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Accepted: 04/30/2018] [Indexed: 11/10/2022] Open
|
18
|
Gallo Cantafio ME, Grillone K, Caracciolo D, Scionti F, Arbitrio M, Barbieri V, Pensabene L, Guzzi PH, Di Martino MT. From Single Level Analysis to Multi-Omics Integrative Approaches: A Powerful Strategy towards the Precision Oncology. High Throughput 2018; 7:ht7040033. [PMID: 30373182 PMCID: PMC6306876 DOI: 10.3390/ht7040033] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2018] [Revised: 10/09/2018] [Accepted: 10/22/2018] [Indexed: 02/06/2023] Open
Abstract
Integration of multi-omics data from different molecular levels with clinical data, as well as epidemiologic risk factors, represents an accurate and promising methodology to understand the complexity of biological systems of human diseases, including cancer. By the extensive use of novel technologic platforms, a large number of multidimensional data can be derived from analysis of health and disease systems. Comprehensive analysis of multi-omics data in an integrated framework, which includes cumulative effects in the context of biological pathways, is therefore eagerly awaited. This strategy could allow the identification of pathway-addiction of cancer cells that may be amenable to therapeutic intervention. However, translation into clinical settings requires an optimized integration of omics data with clinical vision to fully exploit precision cancer medicine. We will discuss the available technical approach and more recent developments in the specific field.
Collapse
Affiliation(s)
- Maria Eugenia Gallo Cantafio
- Department of Experimental and Clinical Medicine, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy.
| | - Katia Grillone
- Department of Experimental and Clinical Medicine, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy.
| | - Daniele Caracciolo
- Department of Experimental and Clinical Medicine, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy.
| | - Francesca Scionti
- Department of Experimental and Clinical Medicine, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy.
| | | | - Vito Barbieri
- Medical Oncology Unit, Mater Domini Hospital, Salvatore Venuta University Campus, 88100 Catanzaro, Italy.
| | - Licia Pensabene
- Department of Medical and Surgical Sciences Pediatric Unit, Magna Graecia University, 88100 Catanzaro, Italy.
| | - Pietro Hiram Guzzi
- Department of Medical and Surgical Sciences, Magna Graecia University, 88100 Catanzaro, Italy.
| | - Maria Teresa Di Martino
- Department of Experimental and Clinical Medicine, Magna Graecia University, Salvatore Venuta University Campus, 88100 Catanzaro, Italy.
| |
Collapse
|
19
|
Autologous reference types can confound the detection of somatic mutation in solid cancers. DNA Repair (Amst) 2018; 69:6-13. [PMID: 30029072 DOI: 10.1016/j.dnarep.2018.07.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Revised: 06/17/2018] [Accepted: 07/03/2018] [Indexed: 11/22/2022]
Abstract
Vast number of somatic mutations has been proved to be affected by the factors of sequencing methods, analysis pipelines and validation methods. We here showed the effect of autologous reference types on the detection of cancer-associated somatic mutations with the somatic single nucleotide variations (SNVs) and clinical data of solid tumors from the Cancer Genome Atlas (TCGA) and International Cancer Genome Consortium (ICGC). The distribution of somatic SNVs was significantly different among groups of autologous references in 6 cancers detected by whole genome sequencing (WGS) and 5 cancers detected by the random sequencing of exonic regions selected from the genome (WXS), especially in protein coding region of 5 cancers with age, gender and TNM adjusted. In addition, only 60.24% (95% CI: 49.65%-70.83%) of the somatic SNVs called from normal blood by WXS were found in those called from normal solid tissue tested by WXS / WGS, while 31.78% (95%CI: 4.14%-59.42%) of the somatic SNVs called from normal tissue adjacent to primary by WXS were found in those from normal blood tested by WXS / WGS. These findings suggested that more representative types of normal tissues should be included in detection of cancer-associated somatic mutations.
Collapse
|
20
|
Hristov BH, Singh M. Network-Based Coverage of Mutational Profiles Reveals Cancer Genes. Cell Syst 2017; 5:221-229.e4. [PMID: 28957656 PMCID: PMC5997485 DOI: 10.1016/j.cels.2017.09.003] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 08/28/2017] [Accepted: 09/06/2017] [Indexed: 12/21/2022]
Abstract
A central goal in cancer genomics is to identify the somatic alterations that underpin tumor initiation and progression. While commonly mutated cancer genes are readily identifiable, those that are rarely mutated across samples are difficult to distinguish from the large numbers of other infrequently mutated genes. We introduce a method, nCOP, that considers per-individual mutational profiles within the context of protein-protein interaction networks in order to identify small connected subnetworks of genes that, while not individually frequently mutated, comprise pathways that are altered across (i.e., "cover") a large fraction of individuals. By analyzing 6,038 samples across 24 different cancer types, we demonstrate that nCOP is highly effective in identifying cancer genes, including those with low mutation frequencies. Overall, our work demonstrates that combining per-individual mutational information with interaction networks is a powerful approach for tackling the mutational heterogeneity observed across cancers.
Collapse
Affiliation(s)
- Borislav H Hristov
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, NJ 08544, USA; Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA.
| |
Collapse
|