1
|
Kwon JJ, Pan J, Gonzalez G, Hahn WC, Zitnik M. On knowing a gene: A distributional hypothesis of gene function. Cell Syst 2024; 15:488-496. [PMID: 38810640 PMCID: PMC11189734 DOI: 10.1016/j.cels.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 02/25/2024] [Accepted: 04/30/2024] [Indexed: 05/31/2024]
Abstract
As words can have multiple meanings that depend on sentence context, genes can have various functions that depend on the surrounding biological system. This pleiotropic nature of gene function is limited by ontologies, which annotate gene functions without considering biological contexts. We contend that the gene function problem in genetics may be informed by recent technological leaps in natural language processing, in which representations of word semantics can be automatically learned from diverse language contexts. In contrast to efforts to model semantics as "is-a" relationships in the 1990s, modern distributional semantics represents words as vectors in a learned semantic space and fuels current advances in transformer-based models such as large language models and generative pre-trained transformers. A similar shift in thinking of gene functions as distributions over cellular contexts may enable a similar breakthrough in data-driven learning from large biological datasets to inform gene function.
Collapse
Affiliation(s)
- Jason J Kwon
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Joshua Pan
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Guadalupe Gonzalez
- Department of Computing, Faculty of Engineering, Imperial College, London SW7 2AZ, UK
| | - William C Hahn
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Marinka Zitnik
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Department of Biomedical Informatics, Boston, MA 02115, USA; Harvard Data Science Initiative, Harvard University, Cambridge, MA 02138, USA; Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA 02134, USA.
| |
Collapse
|
2
|
Zhang P, Zhang B, Ji Y, Jiao J, Zhang Z, Tian C. Cofitness network connectivity determines a fuzzy essential zone in open bacterial pangenome. MLIFE 2024; 3:277-290. [PMID: 38948139 PMCID: PMC11211677 DOI: 10.1002/mlf2.12132] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/20/2024] [Accepted: 04/24/2024] [Indexed: 07/02/2024]
Abstract
Most in silico evolutionary studies commonly assumed that core genes are essential for cellular function, while accessory genes are dispensable, particularly in nutrient-rich environments. However, this assumption is seldom tested genetically within the pangenome context. In this study, we conducted a robust pangenomic Tn-seq analysis of fitness genes in a nutrient-rich medium for Sinorhizobium strains with a canonical open pangenome. To evaluate the robustness of fitness category assignment, Tn-seq data for three independent mutant libraries per strain were analyzed by three methods, which indicates that the Hidden Markov Model (HMM)-based method is most robust to variations between mutant libraries and not sensitive to data size, outperforming the Bayesian and Monte Carlo simulation-based methods. Consequently, the HMM method was used to classify the fitness category. Fitness genes, categorized as essential (ES), advantage (GA), and disadvantage (GD) genes for growth, are enriched in core genes, while nonessential genes (NE) are over-represented in accessory genes. Accessory ES/GA genes showed a lower fitness effect than core ES/GA genes. Connectivity degrees in the cofitness network decrease in the order of ES, GD, and GA/NE. In addition to accessory genes, 1599 out of 3284 core genes display differential essentiality across test strains. Within the pangenome core, both shared quasi-essential (ES and GA) and strain-dependent fitness genes are enriched in similar functional categories. Our analysis demonstrates a considerable fuzzy essential zone determined by cofitness connectivity degrees in Sinorhizobium pangenome and highlights the power of the cofitness network in understanding the genetic basis of ever-increasing prokaryotic pangenome data.
Collapse
Affiliation(s)
- Pan Zhang
- State Key Laboratory of Plant Environmental Resilience, and College of Biological SciencesChina Agricultural UniversityBeijingChina
- MOA Key Laboratory of Soil Microbiology, and Rhizobium Research CenterChina Agricultural UniversityBeijingChina
- Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced TechnologyChinese Academy of SciencesShenzhenChina
| | - Biliang Zhang
- MOA Key Laboratory of Soil Microbiology, and Rhizobium Research CenterChina Agricultural UniversityBeijingChina
- State Key Laboratory of Livestock and Poultry Biotechnology Breeding, and College of Biological SciencesChina Agricultural UniversityBeijingChina
| | - Yuan‐Yuan Ji
- State Key Laboratory of Plant Environmental Resilience, and College of Biological SciencesChina Agricultural UniversityBeijingChina
- MOA Key Laboratory of Soil Microbiology, and Rhizobium Research CenterChina Agricultural UniversityBeijingChina
| | - Jian Jiao
- State Key Laboratory of Plant Environmental Resilience, and College of Biological SciencesChina Agricultural UniversityBeijingChina
- MOA Key Laboratory of Soil Microbiology, and Rhizobium Research CenterChina Agricultural UniversityBeijingChina
| | - Ziding Zhang
- State Key Laboratory of Livestock and Poultry Biotechnology Breeding, and College of Biological SciencesChina Agricultural UniversityBeijingChina
| | - Chang‐Fu Tian
- State Key Laboratory of Plant Environmental Resilience, and College of Biological SciencesChina Agricultural UniversityBeijingChina
- MOA Key Laboratory of Soil Microbiology, and Rhizobium Research CenterChina Agricultural UniversityBeijingChina
| |
Collapse
|
3
|
Jung S, Wang S, Lee D. CancerGATE: Prediction of cancer-driver genes using graph attention autoencoders. Comput Biol Med 2024; 176:108568. [PMID: 38744009 DOI: 10.1016/j.compbiomed.2024.108568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 04/13/2024] [Accepted: 05/05/2024] [Indexed: 05/16/2024]
Abstract
Discovery of the cancer type specific-driver genes is important for understanding the molecular mechanisms of each cancer type and for providing proper treatment. Recently, graph deep learning methods became widely used in finding cancer-driver genes. However, previous methods had limited performance in individual cancer types due to a small number of cancer-driver genes used in training and biases toward the cancer-driver genes used in training the models. Here, we introduce a novel pipeline, CancerGATE that predicts the cancer-driver genes using graph attention autoencoder (GATE) to learn in a self-supervised manner and can be applied to each of the cancer types. CancerGATE utilizes biological network topology and multi-omics data from 15 types of cancer of 20,079 samples from the cancer genome atlas (TCGA). Attention coefficients calculated in the model are used to prioritize cancer-driver genes by comparing coefficients of cancer and normal contexts. CancerGATE shows a higher AUPRC with a difference ranging from 1.5 % to 36.5 % compared to the previous graph deep learning models in each cancer type. We also show that CancerGATE is free from the bias toward cancer-driver genes used in training, revealing mechanisms of the cancer-driver genes in specific cancer types. Finally, we propose novel cancer-driver gene candidates that could be therapeutic targets for specific cancer types.
Collapse
Affiliation(s)
- Seunghwan Jung
- Department of Bio and Brain Engineering, KAIST, Daejeon 34141, Republic of Korea.
| | - Seunghyun Wang
- Department of Bio and Brain Engineering, KAIST, Daejeon 34141, Republic of Korea.
| | - Doheon Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon 34141, Republic of Korea.
| |
Collapse
|
4
|
Pacini C, Duncan E, Gonçalves E, Gilbert J, Bhosle S, Horswell S, Karakoc E, Lightfoot H, Curry E, Muyas F, Bouaboula M, Pedamallu CS, Cortes-Ciriano I, Behan FM, Zalmas LP, Barthorpe A, Francies H, Rowley S, Pollard J, Beltrao P, Parts L, Iorio F, Garnett MJ. A comprehensive clinically informed map of dependencies in cancer cells and framework for target prioritization. Cancer Cell 2024; 42:301-316.e9. [PMID: 38215750 DOI: 10.1016/j.ccell.2023.12.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/05/2023] [Revised: 10/20/2023] [Accepted: 12/15/2023] [Indexed: 01/14/2024]
Abstract
Genetic screens in cancer cell lines inform gene function and drug discovery. More comprehensive screen datasets with multi-omics data are needed to enhance opportunities to functionally map genetic vulnerabilities. Here, we construct a second-generation map of cancer dependencies by annotating 930 cancer cell lines with multi-omic data and analyze relationships between molecular markers and cancer dependencies derived from CRISPR-Cas9 screens. We identify dependency-associated gene expression markers beyond driver genes, and observe many gene addiction relationships driven by gain of function rather than synthetic lethal effects. By combining clinically informed dependency-marker associations with protein-protein interaction networks, we identify 370 anti-cancer priority targets for 27 cancer types, many of which have network-based evidence of a functional link with a marker in a cancer type. Mapping these targets to sequenced tumor cohorts identifies tractable targets in different cancer types. This target prioritization map enhances understanding of gene dependencies and identifies candidate anti-cancer targets for drug development.
Collapse
Affiliation(s)
- Clare Pacini
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Emma Duncan
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Emanuel Gonçalves
- Instituto Superior Técnico (IST), Universidade de Lisboa, 1049-001 Lisboa, Portugal; INESC-ID, 1000-029 Lisboa, Portugal
| | - James Gilbert
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Shriram Bhosle
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Stuart Horswell
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Emre Karakoc
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Howard Lightfoot
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Ed Curry
- Genome Biology, Genomic Sciences, GSK, Stevenage, UK
| | - Francesc Muyas
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
| | | | | | - Isidro Cortes-Ciriano
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
| | - Fiona M Behan
- Genome Biology, Genomic Sciences, GSK, Stevenage, UK
| | - Lykourgos-Panagiotis Zalmas
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Andrew Barthorpe
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Hayley Francies
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Genome Biology, Genomic Sciences, GSK, Stevenage, UK
| | - Steve Rowley
- Sanofi Research and Development, Cambridge, MA, USA
| | - Jack Pollard
- Sanofi Research and Development, Cambridge, MA, USA
| | - Pedro Beltrao
- Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridge CB10 1SD, UK
| | - Leopold Parts
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Francesco Iorio
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Human Technopole, V.le Rita Levi-Montalcini, 1, 20157 Milano, Italy.
| | - Mathew J Garnett
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; Open Targets, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK.
| |
Collapse
|
5
|
Nourbakhsh M, Degn K, Saksager A, Tiberti M, Papaleo E. Prediction of cancer driver genes and mutations: the potential of integrative computational frameworks. Brief Bioinform 2024; 25:bbad519. [PMID: 38261338 PMCID: PMC10805075 DOI: 10.1093/bib/bbad519] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 11/27/2023] [Accepted: 12/11/2023] [Indexed: 01/24/2024] Open
Abstract
The vast amount of available sequencing data allows the scientific community to explore different genetic alterations that may drive cancer or favor cancer progression. Software developers have proposed a myriad of predictive tools, allowing researchers and clinicians to compare and prioritize driver genes and mutations and their relative pathogenicity. However, there is little consensus on the computational approach or a golden standard for comparison. Hence, benchmarking the different tools depends highly on the input data, indicating that overfitting is still a massive problem. One of the solutions is to limit the scope and usage of specific tools. However, such limitations force researchers to walk on a tightrope between creating and using high-quality tools for a specific purpose and describing the complex alterations driving cancer. While the knowledge of cancer development increases daily, many bioinformatic pipelines rely on single nucleotide variants or alterations in a vacuum without accounting for cellular compartments, mutational burden or disease progression. Even within bioinformatics and computational cancer biology, the research fields work in silos, risking overlooking potential synergies or breakthroughs. Here, we provide an overview of databases and datasets for building or testing predictive cancer driver tools. Furthermore, we introduce predictive tools for driver genes, driver mutations, and the impact of these based on structural analysis. Additionally, we suggest and recommend directions in the field to avoid silo-research, moving towards integrative frameworks.
Collapse
Affiliation(s)
- Mona Nourbakhsh
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Kristine Degn
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Astrid Saksager
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
| | - Matteo Tiberti
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| | - Elena Papaleo
- Cancer Systems Biology, Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, 2800 Lyngby, Denmark
- Cancer Structural Biology, Danish Cancer Institute, 2100 Copenhagen, Denmark
| |
Collapse
|
6
|
Gaiteri C, Connell DR, Sultan FA, Iatrou A, Ng B, Szymanski BK, Zhang A, Tasaki S. Robust, scalable, and informative clustering for diverse biological networks. Genome Biol 2023; 24:228. [PMID: 37828545 PMCID: PMC10571258 DOI: 10.1186/s13059-023-03062-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Accepted: 09/19/2023] [Indexed: 10/14/2023] Open
Abstract
Clustering molecular data into informative groups is a primary step in extracting robust conclusions from big data. However, due to foundational issues in how they are defined and detected, such clusters are not always reliable, leading to unstable conclusions. We compare popular clustering algorithms across thousands of synthetic and real biological datasets, including a new consensus clustering algorithm-SpeakEasy2: Champagne. These tests identify trends in performance, show no single method is universally optimal, and allow us to examine factors behind variation in performance. Multiple metrics indicate SpeakEasy2 generally provides robust, scalable, and informative clusters for a range of applications.
Collapse
Affiliation(s)
- Chris Gaiteri
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA.
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA.
- Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA.
| | - David R Connell
- Rush University Graduate College, Rush University Medical Center, Chicago, IL, USA
| | - Faraz A Sultan
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Artemis Iatrou
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
- Department of Psychiatry, McLean Hospital, Harvard Medical School, Harvard University, Belmont, MA, USA
| | - Bernard Ng
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Boleslaw K Szymanski
- Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA
- Network Science and Technology Center, Rensselaer Polytechnic Institute, Troy, NY, USA
- Academy of Social Sciences, Łódź, Poland
| | - Ada Zhang
- Department of Psychiatry and Behavioral Sciences, SUNY Upstate Medical University, Syracuse, NY, USA
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
| | - Shinya Tasaki
- Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, IL, USA
- Department of Neurological Sciences, Rush University Medical Center, Chicago, IL, USA
| |
Collapse
|
7
|
She R, Fair T, Schaefer NK, Saunders RA, Pavlovic BJ, Weissman JS, Pollen AA. Comparative landscape of genetic dependencies in human and chimpanzee stem cells. Cell 2023; 186:2977-2994.e23. [PMID: 37343560 PMCID: PMC10461406 DOI: 10.1016/j.cell.2023.05.043] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Revised: 03/14/2023] [Accepted: 05/26/2023] [Indexed: 06/23/2023]
Abstract
Comparative studies of great apes provide a window into our evolutionary past, but the extent and identity of cellular differences that emerged during hominin evolution remain largely unexplored. We established a comparative loss-of-function approach to evaluate whether human cells exhibit distinct genetic dependencies. By performing genome-wide CRISPR interference screens in human and chimpanzee pluripotent stem cells, we identified 75 genes with species-specific effects on cellular proliferation. These genes comprised coherent processes, including cell-cycle progression and lysosomal signaling, which we determined to be human-derived by comparison with orangutan cells. Human-specific robustness to CDK2 and CCNE1 depletion persisted in neural progenitor cells and cerebral organoids, supporting the G1-phase length hypothesis as a potential evolutionary mechanism in human brain expansion. Our findings demonstrate that evolutionary changes in human cells reshaped the landscape of essential genes and establish a platform for systematically uncovering latent cellular and molecular differences between species.
Collapse
Affiliation(s)
- Richard She
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
| | - Tyler Fair
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA
| | - Nathan K Schaefer
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Reuben A Saunders
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA; Department of Cellular and Molecular Pharmacology, University of California at San Francisco, San Francisco, CA, USA
| | - Bryan J Pavlovic
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Jonathan S Weissman
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA; Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA; Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, USA; David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute Technology, Cambridge, MA 02142, USA.
| | - Alex A Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA; Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
8
|
Usluer S, Hallast P, Crepaldi L, Zhou Y, Urgo K, Dincer C, Su J, Noell G, Alasoo K, El Garwany O, Gerety SS, Newman B, Dovey OM, Parts L. Optimized whole-genome CRISPR interference screens identify ARID1A-dependent growth regulators in human induced pluripotent stem cells. Stem Cell Reports 2023; 18:1061-1074. [PMID: 37028423 PMCID: PMC10202655 DOI: 10.1016/j.stemcr.2023.03.008] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Revised: 03/07/2023] [Accepted: 03/13/2023] [Indexed: 04/09/2023] Open
Abstract
Perturbing expression is a powerful way to understand the role of individual genes, but can be challenging in important models. CRISPR-Cas screens in human induced pluripotent stem cells (iPSCs) are of limited efficiency due to DNA break-induced stress, while the less stressful silencing with an inactive Cas9 has been considered less effective so far. Here, we developed the dCas9-KRAB-MeCP2 fusion protein for screening in iPSCs from multiple donors. We found silencing in a 200 bp window around the transcription start site in polyclonal pools to be as effective as using wild-type Cas9 for identifying essential genes, but with much reduced cell numbers. Whole-genome screens to identify ARID1A-dependent dosage sensitivity revealed the PSMB2 gene, and enrichment of proteasome genes among the hits. This selective dependency was replicated with a proteasome inhibitor, indicating a targetable drug-gene interaction. Many more plausible targets in challenging cell models can be efficiently identified with our approach.
Collapse
Affiliation(s)
| | | | | | - Yan Zhou
- Wellcome Sanger Institute, Cambridge, UK
| | - Katie Urgo
- Wellcome Sanger Institute, Cambridge, UK
| | | | - Jing Su
- Wellcome Sanger Institute, Cambridge, UK
| | | | - Kaur Alasoo
- Department of Computer Science, University of Tartu, Tartu, Estonia
| | | | | | - Ben Newman
- Wellcome Sanger Institute, Cambridge, UK
| | | | - Leopold Parts
- Wellcome Sanger Institute, Cambridge, UK; Department of Computer Science, University of Tartu, Tartu, Estonia.
| |
Collapse
|
9
|
She R, Fair T, Schaefer NK, Saunders RA, Pavlovic BJ, Weissman JS, Pollen AA. Comparative landscape of genetic dependencies in human and chimpanzee stem cells. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.03.19.533346. [PMID: 36993685 PMCID: PMC10055274 DOI: 10.1101/2023.03.19.533346] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Comparative studies of great apes provide a window into our evolutionary past, but the extent and identity of cellular differences that emerged during hominin evolution remain largely unexplored. We established a comparative loss-of-function approach to evaluate whether changes in human cells alter requirements for essential genes. By performing genome-wide CRISPR interference screens in human and chimpanzee pluripotent stem cells, we identified 75 genes with species-specific effects on cellular proliferation. These genes comprised coherent processes, including cell cycle progression and lysosomal signaling, which we determined to be human-derived by comparison with orangutan cells. Human-specific robustness to CDK2 and CCNE1 depletion persisted in neural progenitor cells, providing support for the G1-phase length hypothesis as a potential evolutionary mechanism in human brain expansion. Our findings demonstrate that evolutionary changes in human cells can reshape the landscape of essential genes and establish a platform for systematically uncovering latent cellular and molecular differences between species.
Collapse
Affiliation(s)
- Richard She
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
- These authors contributed equally: Richard She, Tyler Fair
| | - Tyler Fair
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Biomedical Sciences Graduate Program, University of California, San Francisco, San Francisco, CA, USA
- These authors contributed equally: Richard She, Tyler Fair
| | - Nathan K. Schaefer
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Reuben A. Saunders
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
- Department of Cellular and Molecular Pharmacology, University of California at San Francisco, San Francisco, CA, USA
| | - Bryan J. Pavlovic
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
| | - Jonathan S. Weissman
- Whitehead Institute for Biomedical Research, Cambridge, MA, USA
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, USA
- David H. Koch Institute for Integrative Cancer Research, Massachusetts Institute Technology, Cambridge 02142, MA
| | - Alex A. Pollen
- Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA
- Department of Neurology, University of California, San Francisco, San Francisco, CA, USA
- Lead contact
| |
Collapse
|
10
|
Gheorghe V, Hart T. Optimal construction of a functional interaction network from pooled library CRISPR fitness screens. BMC Bioinformatics 2022; 23:510. [PMID: 36443674 PMCID: PMC9707256 DOI: 10.1186/s12859-022-05078-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 11/23/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Functional interaction networks, where edges connect genes likely to operate in the same biological process or pathway, can be inferred from CRISPR knockout screens in cancer cell lines. Genes with similar knockout fitness profiles across a sufficiently diverse set of cell line screens are likely to be co-functional, and these "coessentiality" networks are increasingly powerful predictors of gene function and biological modularity. While several such networks have been published, most use different algorithms for each step of the network construction process. RESULTS In this study, we identify an optimal measure of functional interaction and test all combinations of options at each step-essentiality scoring, sample variance and covariance normalization, and similarity measurement-to identify best practices for generating a functional interaction network from CRISPR knockout data. We show that Bayes Factor and Ceres scores give the best results, that Ceres outperforms the newer Chronos scoring scheme, and that covariance normalization is a critical step in network construction. We further show that Pearson correlation, mathematically identical to ordinary least squares after covariance normalization, can be extended by using partial correlation to detect and amplify signals from "moonlighting" proteins which show context-dependent interaction with different partners. CONCLUSIONS We describe a systematic survey of methods for generating coessentiality networks from the Cancer Dependency Map data and provide a partial correlation-based approach for exploring context-dependent interactions.
Collapse
Affiliation(s)
- Veronica Gheorghe
- grid.240145.60000 0001 2291 4776Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA ,grid.240145.60000 0001 2291 4776Graduate School of Biomedical Sciences, The University of Texas MD Anderson Cancer Center UTHealth, Houston, TX USA
| | - Traver Hart
- grid.240145.60000 0001 2291 4776Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA ,grid.240145.60000 0001 2291 4776Department of Cancer Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| |
Collapse
|