1
|
Wu Z, Famous M, Stoikidou T, Bowden FES, Dominic G, Huws SA, Godoy-Santos F, Oyama LB. Unravelling AMR dynamics in the rumenofaecobiome: Insights, challenges and implications for One Health. Int J Antimicrob Agents 2025; 66:107494. [PMID: 40120959 DOI: 10.1016/j.ijantimicag.2025.107494] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 03/01/2025] [Accepted: 03/13/2025] [Indexed: 03/25/2025]
Abstract
Antimicrobial resistance (AMR) is a critical global threat to human, animal and environmental health, exacerbated by horizontal gene transfer (HGT) via mobile genetic elements. This poses significant challenges that have a negative impact on the sustainability of the One Health approach, hindering its long-term viability and effectiveness in addressing the interconnectedness of global health. Recent studies on livestock animals, specifically ruminants, indicate that culturable ruminal bacteria harbour AMR genes with the potential for HGT. However, these studies have focused predominantly on using the faecobiome as a proxy to the rumen microbiome or using easily isolated and culturable bacteria, overlooking the unculturable population. These unculturable microbial groups could have a profound influence on the rumen resistome and AMR dynamics within livestock ecosystems, potentially holding critical insights for advanced understanding of AMR in One Health. In order to address this gap, this review of current research on the burden of AMR in livestock was undertaken, and it is proposed that combined study of the rumen microbiome and faecobiome, termed the 'rumenofaecobiome', should be performed to enhance understanding of the risks of AMR in ruminant livestock. This review discusses the complexities of the rumen microbiome and the risks of AMR transmission in this microbiome in a One Health context. AMR transmission dynamics and methodologies for assessing the risks of AMR in livestock are summarized, and future considerations for researching the impact of AMR in the rumen microbiome and the implications within the One Health framework are discussed.
Collapse
Affiliation(s)
- Ziming Wu
- School of Biological Science, Institute for Global Food Security, Queen's University Belfast, Belfast, UK.
| | - Mustasim Famous
- School of Biological Science, Institute for Global Food Security, Queen's University Belfast, Belfast, UK; Department of Animal Science, Khulna Agricultural University, Khulna, Bangladesh
| | - Theano Stoikidou
- School of Biological Science, Institute for Global Food Security, Queen's University Belfast, Belfast, UK
| | - Freya E S Bowden
- School of Biological Science, Institute for Global Food Security, Queen's University Belfast, Belfast, UK
| | - Gama Dominic
- School of Biological Science, Institute for Global Food Security, Queen's University Belfast, Belfast, UK
| | - Sharon A Huws
- School of Biological Science, Institute for Global Food Security, Queen's University Belfast, Belfast, UK
| | - Fernanda Godoy-Santos
- School of Biological Science, Institute for Global Food Security, Queen's University Belfast, Belfast, UK
| | - Linda B Oyama
- School of Biological Science, Institute for Global Food Security, Queen's University Belfast, Belfast, UK.
| |
Collapse
|
2
|
Valle F, Caselle M, Osella M. Exploring the latent space of transcriptomic data with topic modeling. NAR Genom Bioinform 2025; 7:lqaf049. [PMID: 40264683 PMCID: PMC12012681 DOI: 10.1093/nargab/lqaf049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2024] [Revised: 04/03/2025] [Accepted: 04/11/2025] [Indexed: 04/24/2025] Open
Abstract
The availability of high-dimensional transcriptomic datasets is increasing at a tremendous pace, together with the need for suitable computational tools. Clustering and dimensionality reduction methods are popular go-to methods to identify basic structures in these datasets. At the same time, different topic modeling techniques have been developed to organize the deluge of available data of natural language using their latent topical structure. This paper leverages the statistical analogies between text and transcriptomic datasets to compare different topic modeling methods when applied to gene expression data. Specifically, we test their accuracy in the specific task of discovering and reconstructing the tissue structure of the human transcriptome and distinguishing healthy from cancerous tissues. We examine the properties of the latent space recovered by different methods, highlight their differences, and their pros and cons across different tasks. We focus in particular on how different statistical priors can affect the results and their interpretability. Finally, we show that the latent topic space can be a useful low-dimensional embedding space, where a basic neural network classifier can annotate transcriptomic profiles with high accuracy.
Collapse
Affiliation(s)
- Filippo Valle
- Physics Department, University of Turin and INFN, Via Pietro Giuria 1, 12125 Torino, Italy
| | - Michele Caselle
- Physics Department, University of Turin and INFN, Via Pietro Giuria 1, 12125 Torino, Italy
| | - Matteo Osella
- Physics Department, University of Turin and INFN, Via Pietro Giuria 1, 12125 Torino, Italy
| |
Collapse
|
3
|
Phan A, Joshi P, Kadelka C, Friedberg I. A longitudinal analysis of function annotations of the human proteome reveals consistently high biases. Database (Oxford) 2025; 2025:baaf036. [PMID: 40338520 PMCID: PMC12060720 DOI: 10.1093/database/baaf036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 02/28/2025] [Accepted: 04/08/2025] [Indexed: 05/09/2025]
Abstract
The resources required to study gene function are limited, especially when considering the number of genes in the human genome and the complexity of their function. Therefore, genes are prioritized for experimental studies based on many different considerations, including, but not limited to, perceived biomedical importance, such as disease-associated genes, or the understanding of biological processes, such as cell signalling pathways. At the same time, most genes are not studied or are under-characterized, which hampers our understanding of their function and potential effects on human health and wellness. Understanding function annotation disparity is a necessary first step toward understanding how much functional knowledge is gained from the human genome, and toward guidelines for better targeting future studies of the genes in the human genome effectively. Here, we present a comprehensive longitudinal analysis of the human proteome utilizing data analysis tools from economics and information theory. Specifically, we view the human proteome as a population of proteins within a knowledge economy: we treat the quantified knowledge of the protein's function as the analogue of wealth and examine the distribution of information in a population of proteins in the proteome in the same manner distribution of wealth is studied in societies. Our results show a highly skewed distribution of information about human proteins over the last decade, in which the inequality in the annotations given to the proteins remains high. Additionally, we examine the correlation between the knowledge about protein function as captured in databases and the interest in proteins as reflected by mentions in the scientific literature. We show a large gap between knowledge and interest and dissect the factors leading to this gap. In conclusion, our study shows that research efforts should be redirected to less studied proteins to mitigate the disparity among human proteins both in databases and literature.
Collapse
Affiliation(s)
- An Phan
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, United States
- Department of Mathematics, Iowa State University, Ames, IA, United States
| | - Parnal Joshi
- Program in Bioinformatics and Computational Biology, Iowa State University, Ames, IA, United States
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, United States
| | - Claus Kadelka
- Department of Mathematics, Iowa State University, Ames, IA, United States
| | - Iddo Friedberg
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, United States
| |
Collapse
|
4
|
Tekpinar M, David L, Henry T, Carbone A. PRESCOTT: a population aware, epistatic, and structural model accurately predicts missense effects. Genome Biol 2025; 26:113. [PMID: 40329382 PMCID: PMC12054230 DOI: 10.1186/s13059-025-03581-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Accepted: 04/17/2025] [Indexed: 05/08/2025] Open
Abstract
Predicting the functional impact of point mutations is a critical challenge in genomics. PRESCOTT reconstructs complete mutational landscapes, identifies mutation-sensitive regions, and categorizes missense variants as benign, pathogenic, or variants of uncertain significance. Leveraging protein sequences, structural models, and population-specific allele frequencies, PRESCOTT surpasses existing methods in classifying ClinVar variants, the ACMG dataset, and over 1800 proteins from the Human Protein Dataset. Its online server facilitates mutation effect predictions for any protein and variant, and includes a database of over 19,000 human proteins, ready for population-specific analyses. Open access to residue-specific scores offers transparency and valuable insights for genomic medicine.
Collapse
Affiliation(s)
- Mustafa Tekpinar
- Department of Computational, Quantitative and Synthetic Biology (CQSB), Sorbonne Université, CNRS, IBPS, UMR 7238, Paris, 75005, France
| | - Laurent David
- Department of Computational, Quantitative and Synthetic Biology (CQSB), Sorbonne Université, CNRS, IBPS, UMR 7238, Paris, 75005, France
| | - Thomas Henry
- Centre International de Recherche en Infectiologie (CIRI), Inserm U1111, Université Claude Bernard Lyon 1, CNRS, UMR5308, ENS de Lyon, Univ Lyon, Lyon, 69007, France
| | - Alessandra Carbone
- Department of Computational, Quantitative and Synthetic Biology (CQSB), Sorbonne Université, CNRS, IBPS, UMR 7238, Paris, 75005, France.
- Institut Universitaire de France (IUF), Paris, France.
| |
Collapse
|
5
|
Lakshman AH, Wright ES. EvoWeaver: large-scale prediction of gene functional associations from coevolutionary signals. Nat Commun 2025; 16:3878. [PMID: 40274827 PMCID: PMC12022180 DOI: 10.1038/s41467-025-59175-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2025] [Accepted: 04/09/2025] [Indexed: 04/26/2025] Open
Abstract
The known universe of uncharacterized proteins is expanding far faster than our ability to annotate their functions through laboratory study. Computational annotation approaches rely on similarity to previously studied proteins, thereby ignoring unstudied proteins. Coevolutionary approaches hold promise for injecting new information into our knowledge of the protein universe by linking proteins through 'guilt-by-association'. However, existing coevolutionary algorithms have insufficient accuracy and scalability to connect the entire universe of proteins. We present EvoWeaver, a method that weaves together 12 signals of coevolution to quantify the degree of shared evolution between genes. EvoWeaver accurately identifies proteins involved in protein complexes or separate steps of a biochemical pathway. We show the merits of EvoWeaver by partly reconstructing known biochemical pathways without any prior knowledge other than that available from genomic sequences. Applying EvoWeaver to 1545 gene groups from 8564 genomes reveals missing connections in popular databases and potentially undiscovered links between proteins.
Collapse
Affiliation(s)
- Aidan H Lakshman
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Erik S Wright
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
- Center for Evolutionary Biology and Medicine, Pittsburgh, PA, USA.
| |
Collapse
|
6
|
Lemke MC, Avala NR, Rader MT, Hargett SR, Lank DS, Seltzer BD, Harris TE. MAST Kinases' Function and Regulation: Insights from Structural Modeling and Disease Mutations. Biomedicines 2025; 13:925. [PMID: 40299535 PMCID: PMC12024977 DOI: 10.3390/biomedicines13040925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Revised: 04/01/2025] [Accepted: 04/03/2025] [Indexed: 04/30/2025] Open
Abstract
Background/Objectives: The MAST kinases are ancient AGC kinases associated with many human diseases, such as cancer, diabetes, and neurodevelopmental disorders. We set out to describe the origins and diversification of MAST kinases from a structural and bioinformatic perspective to inform future research directions. Methods: We investigated MAST-lineage kinases using database and sequence analysis. We also estimate the functional consequences of disease point mutations on protein stability by integrating predictive algorithms and AlphaFold. Results: Higher-order organisms often have multiple MASTs and a single MASTL kinase. MAST proteins conserve an AGC kinase domain, a domain of unknown function 1908 (DUF), and a PDZ binding domain. D. discoideum contains MAST kinase-like proteins that exhibit a characteristic insertion within the T-loop but do not conserve DUF or PDZ domains. While the DUF domain is conserved in plants, the PDZ domain is not. The four mammalian MASTs demonstrate tissue expression heterogeneity by mRNA and protein. MAST1-4 are likely regulated by 14-3-3 proteins based on interactome data and in silico predictions. Comparative ΔΔG estimation identified that MAST1-L232P and G522E mutations are likely destabilizing. Conclusions: We conclude that MAST and MASTL kinases diverged from the primordial MAST, which likely operated in both biological niches. The number of MAST paralogs then expanded to the heterogeneous subfamily seen in mammals that are all likely regulated by 14-3-3 protein interaction. The reported pathogenic mutations in MASTs primarily represent alterations to post-translational modification topology in the DUF and kinase domains. Our report outlines a computational basis for future work in MAST kinase regulation and drug discovery.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Thurl E. Harris
- Department of Pharmacology, University of Virginia, Charlottesville, VA 22903, USA; (M.C.L.)
| |
Collapse
|
7
|
Feuermann M, Mi H, Gaudet P, Muruganujan A, Lewis SE, Ebert D, Mushayahama T, Thomas PD. A compendium of human gene functions derived from evolutionary modelling. Nature 2025; 640:146-154. [PMID: 40011791 PMCID: PMC11964926 DOI: 10.1038/s41586-025-08592-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Accepted: 01/03/2025] [Indexed: 02/28/2025]
Abstract
A comprehensive, computable representation of the functional repertoire of all macromolecules encoded within the human genome is a foundational resource for biology and biomedical research. The Gene Ontology Consortium has been working towards this goal by generating a structured body of information about gene functions, which now includes experimental findings reported in more than 175,000 publications for human genes and genes in experimentally tractable model organisms1,2. Here, we describe the results of a large, international effort to integrate all of these findings to create a representation of human gene functions that is as complete and accurate as possible. Specifically, we apply an expert-curated, explicit evolutionary modelling approach to all human protein-coding genes. This approach integrates available experimental information across families of related genes into models that reconstruct the gain and loss of functional characteristics over evolutionary time. The models and the resulting set of 68,667 integrated gene functions cover approximately 82% of human protein-coding genes. The functional repertoire reveals a marked preponderance of molecular regulatory functions, and the models provide insights into the evolutionary origins of human gene functions. We show that our set of descriptions of functions can improve the widely used genomic technique of Gene Ontology enrichment analysis. The experimental evidence for each functional characteristic is recorded, thereby enabling the scientific community to help review and improve the resource, which we have made publicly available.
Collapse
Affiliation(s)
- Marc Feuermann
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Huaiyu Mi
- Division of Bioinformatics, Department of Population and Public Health Sciences, University of Southern California Los Angeles, Los Angeles, CA, USA
| | - Pascale Gaudet
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland
| | - Anushya Muruganujan
- Division of Bioinformatics, Department of Population and Public Health Sciences, University of Southern California Los Angeles, Los Angeles, CA, USA
| | - Suzanna E Lewis
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Dustin Ebert
- Division of Bioinformatics, Department of Population and Public Health Sciences, University of Southern California Los Angeles, Los Angeles, CA, USA
| | - Tremayne Mushayahama
- Division of Bioinformatics, Department of Population and Public Health Sciences, University of Southern California Los Angeles, Los Angeles, CA, USA
| | - Paul D Thomas
- Division of Bioinformatics, Department of Population and Public Health Sciences, University of Southern California Los Angeles, Los Angeles, CA, USA.
| |
Collapse
|
8
|
Suzuki T, Bono H. Pipeline to explore information on genome editing using large language models and genome editing meta-database. Database (Oxford) 2025; 2025:baaf022. [PMID: 40056431 PMCID: PMC11890094 DOI: 10.1093/database/baaf022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 02/09/2025] [Accepted: 02/19/2025] [Indexed: 03/10/2025]
Abstract
Genome editing (GE) is widely recognized as an effective and valuable technology in life sciences research. However, certain genes are difficult to edit depending on some factors such as the type of species, sequences, and GE tools. Therefore, confirming the presence or absence of GE practices in previous publications is crucial for the effective designing and establishment of research using GE. Although the Genome Editing Meta-database (GEM: https://bonohu.hiroshima-u.ac.jp/gem/) aims to provide as comprehensive GE information as possible, it does not indicate how each registered gene is involved in GE. In this study, we developed a systematic method for extracting essential GE information using large language models from the information based on GEM and GE-related articles. This approach allows for a systematic and efficient investigation of GE information that cannot be achieved using the current GEM alone. In addition, by converting the extracted GE information into metrics, we propose a potential application of this method to prioritize genes for future research. The extracted GE information and novel GE-related scores are expected to facilitate the efficient selection of target genes for GE and support the design of research using GE. Database URLs: https://github.com/szktkyk/extract_geinfo, https://github.com/szktkyk/visualize_geinfo.
Collapse
Affiliation(s)
- Takayuki Suzuki
- Graduate School of Integrated Sciences for Life, Hiroshima University, 3-10-23 Kagamiyama, Higashi-Hiroshima 739-0046, Japan
| | - Hidemasa Bono
- Graduate School of Integrated Sciences for Life, Hiroshima University, 3-10-23 Kagamiyama, Higashi-Hiroshima 739-0046, Japan
- Genome Editing Innovation Center, Hiroshima University, 3-10-23 Kagamiyama, Higashi-Hiroshima 739-0046, Japan
| |
Collapse
|
9
|
Lateef Junaid MA. Artificial intelligence driven innovations in biochemistry: A review of emerging research frontiers. BIOMOLECULES & BIOMEDICINE 2025; 25:739-750. [PMID: 39819459 PMCID: PMC11959397 DOI: 10.17305/bb.2024.11537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/24/2024] [Revised: 12/15/2024] [Accepted: 12/15/2024] [Indexed: 01/19/2025]
Abstract
Artificial intelligence (AI) has become a powerful tool in biochemistry, greatly enhancing research capabilities by enabling the analysis of complex datasets, predicting molecular interactions, and accelerating drug discovery. As AI continues to evolve, its applications in biochemistry are poised to expand, revolutionizing both theoretical and applied research. This review explores current and potential AI applications in biochemistry, with a focus on data analysis, molecular modeling, enzyme engineering, and metabolic pathway studies. Key AI techniques-such as machine learning algorithms, natural language processing, and AI-based molecular modeling-are discussed. The review also highlights emerging research areas benefiting from AI, including personalized medicine and synthetic biology. The methodology involves an extensive analysis of existing literature, particularly peer-reviewed studies on AI applications in biochemistry. AI-driven tools like AlphaFold, which have significantly advanced protein structure prediction, are evaluated alongside AI's role in expediting drug discovery. The review also addresses challenges such as data quality, model interpretability, and ethical considerations. Results indicate that AI has expanded the scope of biochemical research by facilitating large-scale data analysis, enhancing molecular simulations, and opening new avenues of inquiry. However, challenges remain, particularly in data handling and ethical concerns. In conclusion, AI is transforming biochemistry by driving innovation and expanding research possibilities. Future advancements in AI algorithms, interdisciplinary collaboration, and integration with automated techniques will be crucial to fully unlocking AI's potential in advancing biochemical research.
Collapse
Affiliation(s)
- Mohammed Abdul Lateef Junaid
- Department of Basic Medical Sciences, College of Medicine, Majmaah University, Al Majmaah, Kingdom of Saudi Arabia
| |
Collapse
|
10
|
Luo J, Luo Y. Learning maximally spanning representations improves protein function annotation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.13.638156. [PMID: 40027840 PMCID: PMC11870436 DOI: 10.1101/2025.02.13.638156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 03/05/2025]
Abstract
Automated protein function annotation is a fundamental problem in computational biology, crucial for understanding the functional roles of proteins in biological processes, with broad implications in medicine and biotechnology. A persistent challenge in this problem is the imbalanced, long-tail distribution of available function annotations: a small set of well-studied function classes account for most annotated proteins, while many other classes have few annotated proteins, often due to investigative bias, experimental limitations, or intrinsic biases in protein evolution. As a result, existing machine learning models for protein function prediction tend to only optimize the prediction accuracy for well-studied function classes overrepresented in the training data, leading to poor accuracy for understudied functions. In this work, we develop MSRep, a novel deep learning-based protein function annotation framework designed to address this imbalance issue and improve annotation accuracy. MSRep is inspired by an intriguing phenomenon, called neural collapse (NC), commonly observed in high-accuracy deep neural networks used for classification tasks, where hidden representations in the final layer collapse to class-specific mean embeddings, while maintaining maximal inter-class separation. Given that NC consistently emerges across diverse architectures and tasks for high-accuracy models, we hypothesize that inducing NC structure in models trained on imbalanced data can enhance both prediction accuracy and generalizability. To achieve this, MSRep refines a pre-trained protein language model to produce NC-like representations by optimizing an NC-inspired loss function, which ensures that minority functions are equally represented in the embedding space as majority functions, in contrast to conventional classification methods whose embedding spaces are dominated by overrepresented classes. In evaluations across four protein function annotation tasks on the prediction of Enzyme Commission numbers, Gene3D codes, Pfam families, and Gene Ontology terms, MSRep demonstrates superior predictive performance for both well- and underrepresented classes, outperforming several state-of-the-art annotation tools. We anticipate that MSRep will enhance the annotation of understudied functions and novel, uncharacterized proteins, advancing future protein function studies and accelerating the discovery of new functional proteins. The source code of MSRep is available at https://github.com/luo-group/MSRep.
Collapse
Affiliation(s)
- Jiaqi Luo
- School of Computational Science and Engineering, Georgia Institute of Technology
| | - Yunan Luo
- School of Computational Science and Engineering, Georgia Institute of Technology
| |
Collapse
|
11
|
Torres-Rodríguez JV, Li D, Schnable JC. Evolving best practices for transcriptome-wide association studies accelerate discovery of gene-phenotype links. CURRENT OPINION IN PLANT BIOLOGY 2025; 83:102670. [PMID: 39626491 DOI: 10.1016/j.pbi.2024.102670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2024] [Revised: 10/20/2024] [Accepted: 11/01/2024] [Indexed: 02/01/2025]
Abstract
Transcriptome-wide association studies (TWAS) complement genome-wide association studies (GWAS) by using gene expression data to link specific genes to phenotypes. This review examines 37 TWAS studies across eight plant species, evaluating the impact of methodological choices on outcomes using maize and soybean datasets. Large sample sizes and synchronized sample collection for gene expression measurement appear to significantly increase power for discovering gene-phenotype linkages, while matching tissue, stage, and environment may matter much less than previously believed, making it feasible to reuse large and well-collected expression datasets across multiple studies. The development of statistical approaches and computational tools specifically optimized for plant TWAS data will ultimately be needed, but further potential remains to adapt advances developed in GWAS to TWAS contexts.
Collapse
Affiliation(s)
- J Vladimir Torres-Rodríguez
- Quantitative Life Sciences Initiative, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA; Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA; Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Delin Li
- Xianghu Laboratory, Hangzhou, 311231, China
| | - James C Schnable
- Quantitative Life Sciences Initiative, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA; Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA; Department of Agronomy and Horticulture, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.
| |
Collapse
|
12
|
Pir MS, Timucin E. AFFIPred: AlphaFold2 structure-based Functional Impact Prediction of missense variations. Protein Sci 2025; 34:e70030. [PMID: 39840793 PMCID: PMC11751861 DOI: 10.1002/pro.70030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 12/23/2024] [Accepted: 12/24/2024] [Indexed: 01/23/2025]
Abstract
Protein structure holds immense potential for pathogenicity prediction, albeit structure-based predictors are limited compared to the sequence-based counterparts due to the "structure knowledge gap" between large number of available protein sequences and relatively limited number of structures. Leveraging the highly accurate protein structures predicted by AlphaFold2 (AF2), we introduce AFFIPred, an ensemble machine learning classifier that combines sequence and AF2-based structural characteristics to predict missense variant pathogenicity. Based on the assessments on unseen datasets, AFFIPred reached a comparable level of performance with the state-of-the-art predictors such as AlphaMissense. We also showed that the recruitment of AF2 structures that are full-length and represent the unbound states ensures more precise SASA calculations compared to the recruitment of experimental structures. In line with the completeness of the AF2 structures, their use provide a more comprehensive view of the structural characteristics of the missense variation datasets by capturing all variants. AFFIPred maintains high-level accuracy without the limitations of PDB-based classifiers. AFFIPred has predicted over 210 million variations of the human proteome, which are accessible at https://affipred.timucinlab.com/.
Collapse
Affiliation(s)
- Mustafa S Pir
- Department of Biostatistics and Bioinformatics, Institute of Health SciencesAcibadem UniversityAtasehirIstanbulTurkey
| | - Emel Timucin
- Department of Biostatistics and Bioinformatics, Institute of Health SciencesAcibadem UniversityAtasehirIstanbulTurkey
- Department of Biostatistics and Medical Informatics, School of MedicineAcibadem UniversityAtasehirIstanbulTurkey
| |
Collapse
|
13
|
Sunil RS, Lim SC, Itharajula M, Mutwil M. The gene function prediction challenge: Large language models and knowledge graphs to the rescue. CURRENT OPINION IN PLANT BIOLOGY 2024; 82:102665. [PMID: 39579414 DOI: 10.1016/j.pbi.2024.102665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2024] [Revised: 10/23/2024] [Accepted: 10/24/2024] [Indexed: 11/25/2024]
Abstract
Elucidating gene function is one of the ultimate goals of plant science. Despite this, only ∼15 % of all genes in the model plant Arabidopsis thaliana have comprehensively experimentally verified functions. While bioinformatical gene function prediction approaches can guide biologists in their experimental efforts, neither the performance of the gene function prediction methods nor the number of experimental characterization of genes has increased dramatically in recent years. In this review, we will discuss the status quo and the trajectory of gene function elucidation and outline the recent advances in gene function prediction approaches. We will then discuss how recent artificial intelligence advances in large language models and knowledge graphs can be leveraged to accelerate gene function predictions and keep us updated with scientific literature.
Collapse
Affiliation(s)
- Rohan Shawn Sunil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Shan Chun Lim
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Manoj Itharajula
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore
| | - Marek Mutwil
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
| |
Collapse
|
14
|
Lloyd KCK. Commentary: The International Mouse Phenotyping Consortium: high-throughput in vivo functional annotation of the mammalian genome. Mamm Genome 2024; 35:537-543. [PMID: 39254744 PMCID: PMC11522054 DOI: 10.1007/s00335-024-10068-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Accepted: 08/30/2024] [Indexed: 09/11/2024]
Abstract
The International Mouse Phenotyping Consortium (IMPC) is a worldwide effort producing and phenotyping knockout mouse lines to expose the pathophysiological roles of all genes in human diseases and make mice and data available and accessible to the global research community. It has created new knowledge on the function of thousands of genes for which little to anything was known. This new knowledge has informed the genetic basis of rare diseases, posited gene product influences on common diseases, influenced research on targeted therapies, revealed functional pleiotropy, essentiality, and sexual dimorphism, and many more insights into the role of genes in health and disease. Its scientific contributions have been many and widespread, however there remain thousands of "dark" genes yet to be illuminated. Nearing the end of its current funding cycle, IMPC is at a crossroads. The vision forward is clear, the path to proceed less so.
Collapse
Affiliation(s)
- K C Kent Lloyd
- Department of Surgery, School of Medicine, University of California, Davis, California, USA.
- Mouse Biology Program, University of California, Davis, California, USA.
| |
Collapse
|
15
|
Grady SK, Peterson KA, Murray SA, Baker EJ, Langston MA, Chesler EJ. A graph theoretical approach to experimental prioritization in genome-scale investigations. Mamm Genome 2024; 35:724-733. [PMID: 39191873 PMCID: PMC11522061 DOI: 10.1007/s00335-024-10066-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 08/14/2024] [Indexed: 08/29/2024]
Abstract
The goal of systems biology is to gain a network level understanding of how gene interactions influence biological states, and ultimately inform upon human disease. Given the scale and scope of systems biology studies, resource constraints often limit researchers when validating genome-wide phenomena and potentially lead to an incomplete understanding of the underlying mechanisms. Further, prioritization strategies are often biased towards known entities (e.g. previously studied genes/proteins with commercially available reagents), and other technical issues that limit experimental breadth. Here, heterogeneous biological information is modeled as an association graph to which a high-performance minimum dominating set solver is applied to maximize coverage across the graph, and thus increase the breadth of experimentation. First, we tested our model on retrieval of existing gene functional annotations and demonstrated that minimum dominating set returns more diverse terms when compared to other computational methods. Next, we utilized our heterogenous network and minimum dominating set solver to assist in the process of identifying understudied genes to be interrogated by the International Mouse Phenotyping Consortium. Using an unbiased algorithmic strategy, poorly studied genes are prioritized from the remaining thousands of genes yet to be characterized. This method is tunable and extensible with the potential to incorporate additional user-defined prioritizing information. The minimum dominating set approach can be applied to any biological network in order to identify a tractable subset of features to test experimentally or to assist in prioritizing candidate genes associated with human disease.
Collapse
Affiliation(s)
- Stephen K Grady
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA.
| | | | | | - Erich J Baker
- Department of Computer Science, Baylor University, Waco, TX, USA
| | - Michael A Langston
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA
| | | |
Collapse
|
16
|
Madsen EB, Andersen JP. Funding priorities and health outcomes in Danish medical research. Soc Sci Med 2024; 360:117347. [PMID: 39299153 DOI: 10.1016/j.socscimed.2024.117347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 09/09/2024] [Accepted: 09/13/2024] [Indexed: 09/22/2024]
Abstract
External research funding is an essential component of the infrastructure of modern, academic research. Priorities in funding decisions drive what knowledge is generated, and how scientists' careers are shaped. For health research, it can ultimately have implications for health outcomes. The aim of this paper is to illustrate how funding information can be used to track priorities in health research, linking them to disease burdens and research outputs. Furthermore, funding concentrations are analysed from both researcher and disease perspectives, to estimate the influence of personal Matthew-effects on the distribution of health research funding. Denmark is used as the case, including funding information from all major public and private research foundations in the period 2004-2016. Grant information is linked to research outputs and disability-adjusted life-years (DALY rates), for 34,160 publications linked to 2630 grants, receiving DKK 4.8 billion in funding. Data show poor correlation between funding priorities, research activity and disease burdens, with several diseases receiving disproportionate amounts of funding. A research opportunity index is calculated to identify diseases with the highest potential for future investments from a burden-centred point of view. Funding is highly concentrated, both on people and on specific diseases. High funding concentrations on researchers can be a driving factor behind the observed funding-to-burden imbalances, and may risk knowledge stagnation through monopolisation of the market place of ideas. Results indicate that funders of clinical and translational research, as well as some types of biomedical research, need to supplement traditional considerations of scientific excellence with measures of societal challenges and relevance.
Collapse
Affiliation(s)
- Emil Bargmann Madsen
- Danish Centre for Studies in Research and Research Policy, Department of Political Science, Aarhus University, Bartholins Allé 7, DK-8000, Aarhus C, Denmark.
| | - Jens Peter Andersen
- Danish Centre for Studies in Research and Research Policy, Department of Political Science, Aarhus University, Bartholins Allé 7, DK-8000, Aarhus C, Denmark.
| |
Collapse
|
17
|
Weber CJ, Weitzel AJ, Liu AY, Gacasan EG, Sah RL, Cooper KL. Cellular and molecular mechanisms that shape the development and evolution of tail vertebral proportion in mice and jerboas. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.25.620311. [PMID: 39484405 PMCID: PMC11527341 DOI: 10.1101/2024.10.25.620311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/03/2024]
Abstract
Despite the functional importance of the vertebral skeleton, little is known about how individual vertebrae elongate or achieve disproportionate lengths as in the giraffe neck. Rodent tails are an abundantly diverse and more tractable system to understand mechanisms of vertebral growth and proportion. In many rodents, disproportionately long mid-tail vertebrae form a 'crescendo-decrescendo' of lengths in the tail series. In bipedal jerboas, these vertebrae grow exceptionally long such that the adult tail is 1.5x the length of a mouse tail, relative to body length, with four fewer vertebrae. How do vertebrae with the same regional identity elongate differently from their neighbors to establish and diversify adult proportion? Here, we find that vertebral lengths are largely determined by differences in growth cartilage height and the number of cells progressing through endochondral ossification. Hypertrophic chondrocyte size, a major contributor to differential elongation in mammal limb bones, differs only in the longest jerboa mid-tail vertebrae where they are exceptionally large. To uncover candidate molecular mechanisms of disproportionate vertebral growth, we performed intersectional RNA-Seq of mouse and jerboa tail vertebrae with similar and disproportionate elongation rates. Many regulators of posterior axial identity and endochondral elongation are disproportionately differentially expressed in jerboa vertebrae. Among these, the inhibitory natriuretic peptide receptor C (NPR3) appears in multiple studies of rodent and human skeletal proportion suggesting it refines local growth rates broadly in the skeleton and broadly in mammals. Consistent with this hypothesis, NPR3 loss of function mice have abnormal tail and limb proportions. Therefore, in addition to genetic components of the complex process of vertebral evolution, these studies reveal fundamental mechanisms of skeletal growth and proportion.
Collapse
Affiliation(s)
- Ceri J Weber
- Department of Cell and Developmental Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Alexander J Weitzel
- Department of Cell and Developmental Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Alexander Y Liu
- Department of Cell and Developmental Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Erica G Gacasan
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, California, USA
| | - Robert L Sah
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, California, USA
| | - Kimberly L Cooper
- Department of Cell and Developmental Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA
| |
Collapse
|
18
|
Jones RA, Cooper F, Kelly G, Barry D, Renshaw MJ, Sapkota G, Smith JC. Zebrafish reveal new roles for Fam83f in hatching and the DNA damage-mediated autophagic response. Open Biol 2024; 14:240194. [PMID: 39437839 PMCID: PMC11495952 DOI: 10.1098/rsob.240194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 07/24/2024] [Accepted: 07/25/2024] [Indexed: 10/25/2024] Open
Abstract
The FAM83 (Family with sequence similarity 83) family is highly conserved in vertebrates, but little is known of the functions of these proteins beyond their association with oncogenesis. Of the family, FAM83F is of particular interest because it is the only membrane-targeted FAM83 protein. When overexpressed, FAM83F activates the canonical Wnt signalling pathway and binds to and stabilizes p53; it therefore interacts with two pathways often dysregulated in disease. Insights into gene function can often be gained by studying the roles they play during development, and here we report the generation of fam83f knock-out (KO) zebrafish, which we have used to study the role of Fam83f in vivo. We show that endogenous fam83f is most strongly expressed in the hatching gland of developing zebrafish embryos, and that fam83f KO embryos hatch earlier than their wild-type (WT) counterparts, despite developing at a comparable rate. We also demonstrate that fam83f KO embryos are more sensitive to ionizing radiation than WT embryos-an unexpected finding, bearing in mind the previously reported ability of FAM83F to stabilize p53. Transcriptomic analysis shows that loss of fam83f leads to downregulation of phosphatidylinositol-3-phosphate (PI(3)P) binding proteins and impairment of cellular degradation pathways, particularly autophagy, a crucial component of the DNA damage response. Finally, we show that Fam83f protein is itself targeted to the lysosome when overexpressed in HEK293T cells, and that this localization is dependent upon a C' terminal signal sequence. The zebrafish lines we have generated suggest that Fam83f plays an important role in autophagic/lysosomal processes, resulting in dysregulated hatching and increased sensitivity to genotoxic stress in vivo.
Collapse
Affiliation(s)
- Rebecca A. Jones
- Department of Molecular Biology, Princeton University, Princeton, NJ08544, USA
| | - Fay Cooper
- School of Biosciences, University of Sheffield, SheffieldS10 2TN, UK
- Neuroscience Institute, University of Sheffield, SheffieldS10 2TN, UK
| | - Gavin Kelly
- The Francis Crick Institute, 1 Midland Road, LondonNW1 1AT, UK
| | - David Barry
- The Francis Crick Institute, 1 Midland Road, LondonNW1 1AT, UK
| | | | - Gopal Sapkota
- MRC Protein Phosphorylation and Ubiquitylation Unit, School of Life Sciences, University of Dundee, Dow Street, DundeeDD1 5EH, UK
| | - James C. Smith
- The Francis Crick Institute, 1 Midland Road, LondonNW1 1AT, UK
| |
Collapse
|
19
|
Pividori M, Ritchie MD, Milone DH, Greene CS. An efficient, not-only-linear correlation coefficient based on clustering. Cell Syst 2024; 15:854-868.e3. [PMID: 39243756 PMCID: PMC11951854 DOI: 10.1016/j.cels.2024.08.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Revised: 06/18/2024] [Accepted: 08/15/2024] [Indexed: 09/09/2024]
Abstract
Identifying meaningful patterns in data is crucial for understanding complex biological processes, particularly in transcriptomics, where genes with correlated expression often share functions or contribute to disease mechanisms. Traditional correlation coefficients, which primarily capture linear relationships, may overlook important nonlinear patterns. We introduce the clustermatch correlation coefficient (CCC), a not-only-linear coefficient that utilizes clustering to efficiently detect both linear and nonlinear associations. CCC outperforms standard methods by revealing biologically meaningful patterns that linear-only coefficients miss and is faster than state-of-the-art coefficients such as the maximal information coefficient. When applied to human gene expression data from genotype-tissue expression (GTEx), CCC identified robust linear relationships and nonlinear patterns, such as sex-specific differences, that are undetectable by standard methods. Highly ranked gene pairs were enriched for interactions in integrated networks built from protein-protein interactions, transcription factor regulation, and chemical and genetic perturbations, suggesting that CCC can detect functional relationships missed by linear-only approaches. CCC is a highly efficient, next-generation, not-only-linear correlation coefficient for genome-scale data. A record of this paper's transparent peer review process is included in the supplemental information.
Collapse
Affiliation(s)
- Milton Pividori
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA; Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | - Marylyn D Ritchie
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Diego H Milone
- Research Institute for Signals, Systems and Computational Intelligence (sinc(i)), Universidad Nacional del Litoral, Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Santa Fe CP3000, Argentina
| | - Casey S Greene
- Department of Biomedical Informatics, University of Colorado School of Medicine, Aurora, CO 80045, USA; Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA.
| |
Collapse
|
20
|
Barrios-Núñez I, Martínez-Redondo G, Medina-Burgos P, Cases I, Fernández R, Rojas A. Decoding functional proteome information in model organisms using protein language models. NAR Genom Bioinform 2024; 6:lqae078. [PMID: 38962255 PMCID: PMC11217674 DOI: 10.1093/nargab/lqae078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2024] [Revised: 05/31/2024] [Accepted: 06/26/2024] [Indexed: 07/05/2024] Open
Abstract
Protein language models have been tested and proved to be reliable when used on curated datasets but have not yet been applied to full proteomes. Accordingly, we tested how two different machine learning-based methods performed when decoding functional information from the proteomes of selected model organisms. We found that protein language models are more precise and informative than deep learning methods for all the species tested and across the three gene ontologies studied, and that they better recover functional information from transcriptomic experiments. The results obtained indicate that these language models are likely to be suitable for large-scale annotation and downstream analyses, and we recommend a guide for their use.
Collapse
Affiliation(s)
- Israel Barrios-Núñez
- Computational Biology and Bioinformatics Group, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain
| | | | - Patricia Medina-Burgos
- Computational Biology and Bioinformatics Group, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain
| | - Ildefonso Cases
- Bioinformatics Unit, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain
| | - Rosa Fernández
- Metazoa Phylogenomics Lab, Institute of Evolutionary Biology (CSIC-UPF), 08003 Barcelona, Spain
| | - Ana M Rojas
- Computational Biology and Bioinformatics Group, Andalusian Center for Developmental Biology (CABD-CSIC), 41013 Sevilla, Spain
| |
Collapse
|
21
|
Correa Marrero M, Jänes J, Baptista D, Beltrao P. Integrating Large-Scale Protein Structure Prediction into Human Genetics Research. Annu Rev Genomics Hum Genet 2024; 25:123-140. [PMID: 38621234 DOI: 10.1146/annurev-genom-120622-020615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/17/2024]
Abstract
The last five years have seen impressive progress in deep learning models applied to protein research. Most notably, sequence-based structure predictions have seen transformative gains in the form of AlphaFold2 and related approaches. Millions of missense protein variants in the human population lack annotations, and these computational methods are a valuable means to prioritize variants for further analysis. Here, we review the recent progress in deep learning models applied to the prediction of protein structure and protein variants, with particular emphasis on their implications for human genetics and health. Improved prediction of protein structures facilitates annotations of the impact of variants on protein stability, protein-protein interaction interfaces, and small-molecule binding pockets. Moreover, it contributes to the study of host-pathogen interactions and the characterization of protein function. As genome sequencing in large cohorts becomes increasingly prevalent, we believe that better integration of state-of-the-art protein informatics technologies into human genetics research is of paramount importance.
Collapse
Affiliation(s)
- Miguel Correa Marrero
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | - Jürgen Jänes
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| | | | - Pedro Beltrao
- Instituto Gulbenkian de Ciência, Oeiras, Portugal
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
- Institute of Molecular Systems Biology, Department of Biology, ETH Zurich, Zurich, Switzerland;
| |
Collapse
|
22
|
Shilts J, Wright GJ. Mapping the Human Cell Surface Interactome: A Key to Decode Cell-to-Cell Communication. Annu Rev Biomed Data Sci 2024; 7:155-177. [PMID: 38723658 DOI: 10.1146/annurev-biodatasci-102523-103821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/25/2024]
Abstract
Proteins on the surfaces of cells serve as physical connection points to bridge one cell with another, enabling direct communication between cells and cohesive structure. As biomedical research makes the leap from characterizing individual cells toward understanding the multicellular organization of the human body, the binding interactions between molecules on the surfaces of cells are foundational both for computational models and for clinical efforts to exploit these influential receptor pathways. To achieve this grander vision, we must assemble the full interactome of ways surface proteins can link together. This review investigates how close we are to knowing the human cell surface protein interactome. We summarize the current state of databases and systematic technologies to assemble surface protein interactomes, while highlighting substantial gaps that remain. We aim for this to serve as a road map for eventually building a more robust picture of the human cell surface protein interactome.
Collapse
Affiliation(s)
- Jarrod Shilts
- Department of Biology, Hull York Medical School, York Biomedical Research Institute, University of York, York, United Kingdom;
- School of the Biological Sciences, University of Cambridge, Cambridge, United Kingdom;
| | - Gavin J Wright
- Department of Biology, Hull York Medical School, York Biomedical Research Institute, University of York, York, United Kingdom;
| |
Collapse
|
23
|
Arowolo O, Suvorov A. Underexplored Molecular Mechanisms of Toxicity. J Xenobiot 2024; 14:939-949. [PMID: 39051348 PMCID: PMC11270369 DOI: 10.3390/jox14030052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Revised: 07/01/2024] [Accepted: 07/15/2024] [Indexed: 07/27/2024] Open
Abstract
Social biases may concentrate the attention of researchers on a small number of well-known molecules/mechanisms leaving others underexplored. In accordance with this view, central to mechanistic toxicology is a narrow range of molecular pathways that are assumed to be involved in a significant part of the responses to toxicity. It is unclear, however, if there are other molecular mechanisms which play an important role in toxicity events but are overlooked by toxicology. To identify overlooked genes sensitive to chemical exposures, we used publicly available databases. First, we used data on the published chemical-gene interactions for 17,338 genes to estimate their sensitivity to chemical exposures. Next, we extracted data on publication numbers per gene for 19,243 human genes from the Find My Understudied Genes database. Thresholds were applied to both datasets using our algorithm to identify chemically sensitive and chemically insensitive genes and well-studied and underexplored genes. A total of 1110 underexplored genes highly sensitive to chemical exposures were used in GSEA and Shiny GO analyses to identify enriched biological categories. The metabolism of fatty acids, amino acids, and glucose were identified as underexplored molecular mechanisms sensitive to chemical exposures. These findings suggest that future effort is needed to uncover the role of xenobiotics in the current epidemics of metabolic diseases.
Collapse
Affiliation(s)
| | - Alexander Suvorov
- Department of Environmental Health Sciences, School of Public Health and Health Sciences, University of Massachusetts, 686 North Pleasant Street, Amherst, MA 01003, USA;
| |
Collapse
|
24
|
Bailey JK, Ma D, Clegg DO. Initial Characterization of WDR5B Reveals a Role in the Proliferation of Retinal Pigment Epithelial Cells. Cells 2024; 13:1189. [PMID: 39056772 PMCID: PMC11275010 DOI: 10.3390/cells13141189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 07/04/2024] [Accepted: 07/08/2024] [Indexed: 07/28/2024] Open
Abstract
The chromatin-associated protein WDR5 has been widely studied due to its role in histone modification and its potential as a pharmacological target for the treatment of cancer. In humans, the protein with highest sequence homology to WDR5 is encoded by the retrogene WDR5B, which remains unexplored. Here, we used CRISPR-Cas9 genome editing to generate WDR5B knockout and WDR5B-FLAG knock-in cell lines for further characterization. In contrast to WDR5, WDR5B exhibits low expression in pluripotent cells and is upregulated upon neural differentiation. Loss or shRNA depletion of WDR5B impairs cell growth and increases the fraction of non-viable cells in proliferating retinal pigment epithelial (RPE) cultures. CUT&RUN chromatin profiling in RPE and neural progenitors indicates minimal WDR5B enrichment at established WDR5 binding sites. These results suggest that WDR5 and WDR5B exhibit several divergent biological properties despite sharing a high degree of sequence homology.
Collapse
Affiliation(s)
- Jeffrey K. Bailey
- Department of Molecular, Cellular and Developmental Biology, Neuroscience Research Institute, University of California, Santa Barbara, CA 93106, USA
- Center for Stem Cell Biology and Engineering, University of California, Santa Barbara, CA 93106, USA
| | - Dzwokai Ma
- Department of Molecular, Cellular and Developmental Biology, Neuroscience Research Institute, University of California, Santa Barbara, CA 93106, USA
| | - Dennis O. Clegg
- Department of Molecular, Cellular and Developmental Biology, Neuroscience Research Institute, University of California, Santa Barbara, CA 93106, USA
- Center for Stem Cell Biology and Engineering, University of California, Santa Barbara, CA 93106, USA
| |
Collapse
|
25
|
Kwon JJ, Pan J, Gonzalez G, Hahn WC, Zitnik M. On knowing a gene: A distributional hypothesis of gene function. Cell Syst 2024; 15:488-496. [PMID: 38810640 PMCID: PMC11189734 DOI: 10.1016/j.cels.2024.04.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 02/25/2024] [Accepted: 04/30/2024] [Indexed: 05/31/2024]
Abstract
As words can have multiple meanings that depend on sentence context, genes can have various functions that depend on the surrounding biological system. This pleiotropic nature of gene function is limited by ontologies, which annotate gene functions without considering biological contexts. We contend that the gene function problem in genetics may be informed by recent technological leaps in natural language processing, in which representations of word semantics can be automatically learned from diverse language contexts. In contrast to efforts to model semantics as "is-a" relationships in the 1990s, modern distributional semantics represents words as vectors in a learned semantic space and fuels current advances in transformer-based models such as large language models and generative pre-trained transformers. A similar shift in thinking of gene functions as distributions over cellular contexts may enable a similar breakthrough in data-driven learning from large biological datasets to inform gene function.
Collapse
Affiliation(s)
- Jason J Kwon
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Joshua Pan
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Guadalupe Gonzalez
- Department of Computing, Faculty of Engineering, Imperial College, London SW7 2AZ, UK
| | - William C Hahn
- Dana-Farber Cancer Institute and Harvard Medical School, Department of Medical Oncology, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Marinka Zitnik
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard Medical School, Department of Biomedical Informatics, Boston, MA 02115, USA; Harvard Data Science Initiative, Harvard University, Cambridge, MA 02138, USA; Kempner Institute for the Study of Natural and Artificial Intelligence, Harvard University, Allston, MA 02134, USA.
| |
Collapse
|
26
|
Lucaci AG, Pond SLK. AOC: Analysis of Orthologous Collections - an application for the characterization of natural selection in protein-coding sequences. ARXIV 2024:arXiv:2406.09522v1. [PMID: 38947939 PMCID: PMC11213150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 07/02/2024]
Abstract
Motivation Modern molecular sequence analysis increasingly relies on automated and robust software tools for interpretation, annotation, and biological insight. The Analysis of Orthologous Collections (AOC) application automates the identification of genomic sites and species/lineages influenced by natural selection in coding sequence analysis. AOC quantifies different types of selection: negative, diversifying or directional positive, or differential selection between groups of branches. We include all steps necessary to go from unaligned homologous sequences to complete results and interactive visualizations that are designed to aid in the useful interpretation and contextualization. Results We are motivated by a desire to make evolutionary analyses as simple as possible, and to close the disparity in the literature between genes which draw a significant amount of interest and those that are largely overlooked and underexplored. We believe that such underappreciated and understudied genetic datasets can hold rich biological information and offer substantial insights into the diverse patterns and processes of evolution, especially if domain experts are able to perform the analyses themselves. Availability and implementation A Snakemake [Mölder et al., 2021] application implementation is publicly available on GitHub at https://github.com/aglucaci/AnalysisOfOrthologousCollections and is accompanied by software documentation and a tutorial.
Collapse
Affiliation(s)
- Alexander G Lucaci
- Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY 10021, USA
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY 10021, USA
| | | |
Collapse
|
27
|
Franco-Romero A, Morbidoni V, Milan G, Sartori R, Wulff J, Romanello V, Armani A, Salviati L, Conte M, Salvioli S, Franceschi C, Buonomo V, Swoboda CO, Grumati P, Pannone L, Martinelli S, Jefferies HB, Dikic I, van der Laan J, Cabreiro F, Millay DP, Tooze SA, Trevisson E, Sandri M. C16ORF70/MYTHO promotes healthy aging in C.elegans and prevents cellular senescence in mammals. J Clin Invest 2024; 134:e165814. [PMID: 38869949 PMCID: PMC11291266 DOI: 10.1172/jci165814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2022] [Accepted: 06/07/2024] [Indexed: 06/15/2024] Open
Abstract
The identification of genes that confer either extension of life span or accelerate age-related decline was a step forward in understanding the mechanisms of aging and revealed that it is partially controlled by genetics and transcriptional programs. Here, we discovered that the human DNA sequence C16ORF70 encodes a protein, named MYTHO (macroautophagy and youth optimizer), which controls life span and health span. MYTHO protein is conserved from Caenorhabditis elegans to humans and its mRNA was upregulated in aged mice and elderly people. Deletion of the orthologous myt-1 gene in C. elegans dramatically shortened life span and decreased animal survival upon exposure to oxidative stress. Mechanistically, MYTHO is required for autophagy likely because it acts as a scaffold that binds WIPI2 and BCAS3 to recruit and assemble the conjugation system at the phagophore, the nascent autophagosome. We conclude that MYTHO is a transcriptionally regulated initiator of autophagy that is central in promoting stress resistance and healthy aging.
Collapse
Affiliation(s)
- Anais Franco-Romero
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Veneto Institute of Molecular Medicine, Padova, Italy
| | - Valeria Morbidoni
- Clinical Genetics Unit, Department of Women’s and Children’s Health, University of Padova, Padova, Italy
- Pediatric Research Institute (IRP) - Fondazione Città della Speranza, Padova, Italy
| | - Giulia Milan
- Department of Cardiac Surgery, University Hospital Basel and Department of Biomedicine, University of Basel, Basel, Switzerland
| | - Roberta Sartori
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Veneto Institute of Molecular Medicine, Padova, Italy
| | - Jesper Wulff
- Institute of Biochemistry II, Goethe University Frankfurt - Medical Faculty, University Hospital, Frankfurt am Main, Germany
| | - Vanina Romanello
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Veneto Institute of Molecular Medicine, Padova, Italy
| | - Andrea Armani
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Veneto Institute of Molecular Medicine, Padova, Italy
| | - Leonardo Salviati
- Clinical Genetics Unit, Department of Women’s and Children’s Health, University of Padova, Padova, Italy
- Pediatric Research Institute (IRP) - Fondazione Città della Speranza, Padova, Italy
| | - Maria Conte
- Department of Medical and Surgical Science (DIMEC), University of Bologna, Bologna, Italy
| | - Stefano Salvioli
- Department of Medical and Surgical Science (DIMEC), University of Bologna, Bologna, Italy
- IRCCS Azienda Ospedaliero-Universitaria di Bologna, Bologna, Italy
| | - Claudio Franceschi
- Institute of Information Technologies, Mathematics and Mechanics, Lobachevsky State University, Nizhny Novgorod, Russia
| | - Viviana Buonomo
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli, Italy
| | - Casey O. Swoboda
- Division of Molecular Cardiovascular Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
- Division of Biomedical Informatics, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
| | - Paolo Grumati
- Telethon Institute of Genetics and Medicine (TIGEM), Pozzuoli, Italy
- Department of Clinical Medicine and Surgery, University of Naples Federico II, Naples, Italy
| | - Luca Pannone
- Department of Oncology and Molecular Medicine, Istituto Superiore di Sanità, Rome, Italy
| | - Simone Martinelli
- Department of Oncology and Molecular Medicine, Istituto Superiore di Sanità, Rome, Italy
| | - Harold B.J. Jefferies
- The Francis Crick Institute, Molecular Cell Biology of Autophagy, London, United Kingdom
| | - Ivan Dikic
- Institute of Biochemistry II, Goethe University Frankfurt - Medical Faculty, University Hospital, Frankfurt am Main, Germany
- Buchmann Institute for Molecular Life Sciences, Goethe University Frankfurt - Riedberg Campus, Frankfurt am Main, Germany
| | - Jennifer van der Laan
- CECAD Research Cluster, University of Cologne, Cologne, Germany
- Institute of Clinical Sciences, Imperial College London, Hammersmith Hospital Campus, London, UK
| | - Filipe Cabreiro
- CECAD Research Cluster, University of Cologne, Cologne, Germany
- Institute of Clinical Sciences, Imperial College London, Hammersmith Hospital Campus, London, UK
| | - Douglas P. Millay
- Division of Molecular Cardiovascular Biology, Cincinnati Children’s Hospital Medical Center, Cincinnati, Ohio, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, Ohio, USA
| | - Sharon A. Tooze
- The Francis Crick Institute, Molecular Cell Biology of Autophagy, London, United Kingdom
| | - Eva Trevisson
- Clinical Genetics Unit, Department of Women’s and Children’s Health, University of Padova, Padova, Italy
- Pediatric Research Institute (IRP) - Fondazione Città della Speranza, Padova, Italy
| | - Marco Sandri
- Department of Biomedical Sciences, University of Padova, Padova, Italy
- Veneto Institute of Molecular Medicine, Padova, Italy
- Myology Center, University of Padova, Padova, Italy
- Department of Medicine, McGill University, Montreal, Canada
| |
Collapse
|
28
|
Oba GM, Nakato R. Clover: An unbiased method for prioritizing differentially expressed genes using a data-driven approach. Genes Cells 2024; 29:456-470. [PMID: 38602264 PMCID: PMC11163938 DOI: 10.1111/gtc.13119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2024] [Revised: 03/12/2024] [Accepted: 03/20/2024] [Indexed: 04/12/2024]
Abstract
Identifying key genes from a list of differentially expressed genes (DEGs) is a critical step in transcriptome analysis. However, current methods, including Gene Ontology analysis and manual annotation, essentially rely on existing knowledge, which is highly biased depending on the extent of the literature. As a result, understudied genes, some of which may be associated with important molecular mechanisms, are often ignored or remain obscure. To address this problem, we propose Clover, a data-driven scoring method to specifically highlight understudied genes. Clover aims to prioritize genes associated with important molecular mechanisms by integrating three metrics: the likelihood of appearing in the DEG list, tissue specificity, and number of publications. We applied Clover to Alzheimer's disease data and confirmed that it successfully detected known associated genes. Moreover, Clover effectively prioritized understudied but potentially druggable genes. Overall, our method offers a novel approach to gene characterization and has the potential to expand our understanding of gene functions. Clover is an open-source software written in Python3 and available on GitHub at https://github.com/G708/Clover.
Collapse
Affiliation(s)
- Gina Miku Oba
- Laboratory of Computational Genomics, Institute for Quantitative BiosciencesUniversity of TokyoTokyoJapan
- Department of Computational Biology and Medical Science, Graduate School of Frontier ScienceUniversity of TokyoTokyoJapan
| | - Ryuichiro Nakato
- Laboratory of Computational Genomics, Institute for Quantitative BiosciencesUniversity of TokyoTokyoJapan
- Department of Computational Biology and Medical Science, Graduate School of Frontier ScienceUniversity of TokyoTokyoJapan
| |
Collapse
|
29
|
Ullman MT, Clark GM, Pullman MY, Lovelett JT, Pierpont EI, Jiang X, Turkeltaub PE. The neuroanatomy of developmental language disorder: a systematic review and meta-analysis. Nat Hum Behav 2024; 8:962-975. [PMID: 38491094 DOI: 10.1038/s41562-024-01843-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 02/01/2024] [Indexed: 03/18/2024]
Abstract
Developmental language disorder (DLD) is a common neurodevelopmental disorder with adverse impacts that continue into adulthood. However, its neural bases remain unclear. Here we address this gap by systematically identifying and quantitatively synthesizing neuroanatomical studies of DLD using co-localization likelihood estimation, a recently developed neuroanatomical meta-analytic technique. Analyses of structural brain data (22 peer-reviewed papers, 577 participants) revealed highly consistent anomalies only in the basal ganglia (100% of participant groups in which this structure was examined, weighted by group sample sizes; 99.8% permutation-based likelihood the anomaly clustering was not due to chance). These anomalies were localized specifically to the anterior neostriatum (again 100% weighted proportion and 99.8% likelihood). As expected given the task dependence of activation, functional neuroimaging data (11 peer-reviewed papers, 414 participants) yielded less consistency, though anomalies again occurred primarily in the basal ganglia (79.0% and 95.1%). Multiple sensitivity analyses indicated that the patterns were robust. The meta-analyses elucidate the neuroanatomical signature of DLD, and implicate the basal ganglia in particular. The findings support the procedural circuit deficit hypothesis of DLD, have basic research and translational implications for the disorder, and advance our understanding of the neuroanatomy of language.
Collapse
Affiliation(s)
- Michael T Ullman
- Brain and Language Laboratory, Department of Neuroscience, Georgetown University, Washington DC, USA.
| | - Gillian M Clark
- Cognitive Neuroscience Unit, School of Psychology, Deakin University, Geelong, Victoria, Australia
| | - Mariel Y Pullman
- Brain and Language Laboratory, Department of Neuroscience, Georgetown University, Washington DC, USA
- Mount Sinai Beth Israel, New York, NY, USA
| | - Jarrett T Lovelett
- Brain and Language Laboratory, Department of Neuroscience, Georgetown University, Washington DC, USA
- Department of Psychology, University of California, San Diego, La Jolla, CA, USA
| | - Elizabeth I Pierpont
- Department of Pediatrics, University of Minnesota Medical Center, Minneapolis, MN, USA
| | - Xiong Jiang
- Department of Neuroscience, Georgetown University, Washington DC, USA
| | - Peter E Turkeltaub
- Center for Brain Plasticity and Recovery, Georgetown University, Washington DC, USA
- Research Division, MedStar National Rehabilitation Network, Washington DC, USA
| |
Collapse
|
30
|
Richardson R, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: Understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. eLife 2024; 12:RP93429. [PMID: 38546716 PMCID: PMC10977968 DOI: 10.7554/elife.93429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/01/2024] Open
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of -omics studies. To promote the investigation of understudied genes, we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese Richardson
- Interdisciplinary Biological Sciences, Northwestern UniversityEvanstonUnited States
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- Northwestern Institute on Complex Systems, Northwestern UniversityEvanstonUnited States
- Department of Molecular Biosciences, Northwestern UniversityEvanstonUnited States
- Department of Physics and Astronomy, Northwestern UniversityEvanstonUnited States
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern UniversityEvanstonUnited States
- The Potocsnak Longevity Institute, Northwestern UniversityChicagoUnited States
- Simpson Querrey Lung Institute for Translational Science, Northwestern UniversityChicagoUnited States
| |
Collapse
|
31
|
Richardson RAK, Tejedor Navarro H, Amaral LAN, Stoeger T. Meta-Research: understudied genes are lost in a leaky pipeline between genome-wide assays and reporting of results. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.02.28.530483. [PMID: 36909550 PMCID: PMC10002660 DOI: 10.1101/2023.02.28.530483] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2023]
Abstract
Present-day publications on human genes primarily feature genes that already appeared in many publications prior to completion of the Human Genome Project in 2003. These patterns persist despite the subsequent adoption of high-throughput technologies, which routinely identify novel genes associated with biological processes and disease. Although several hypotheses for bias in the selection of genes as research targets have been proposed, their explanatory powers have not yet been compared. Our analysis suggests that understudied genes are systematically abandoned in favor of better-studied genes between the completion of -omics experiments and the reporting of results. Understudied genes remain abandoned by studies that cite these -omics experiments. Conversely, we find that publications on understudied genes may even accrue a greater number of citations. Among 45 biological and experimental factors previously proposed to affect which genes are being studied, we find that 33 are significantly associated with the choice of hit genes presented in titles and abstracts of - omics studies. To promote the investigation of understudied genes we condense our insights into a tool, find my understudied genes (FMUG), that allows scientists to engage with potential bias during the selection of hits. We demonstrate the utility of FMUG through the identification of genes that remain understudied in vertebrate aging. FMUG is developed in Flutter and is available for download at fmug.amaral.northwestern.edu as a MacOS/Windows app.
Collapse
Affiliation(s)
- Reese AK Richardson
- Interdisciplinary Biological Sciences, Northwestern University
- Department of Chemical and Biological Engineering, Northwestern University
| | - Heliodoro Tejedor Navarro
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
| | - Luis A Nunes Amaral
- Department of Chemical and Biological Engineering, Northwestern University
- Northwestern Institute on Complex Systems, Northwestern University
- Department of Physics and Astronomy, Northwestern University
- Department of Molecular Biosciences, Northwestern University
| | - Thomas Stoeger
- Department of Chemical and Biological Engineering, Northwestern University
- The Potocsnak Longevity Institute, Northwestern University
- Simpson Querrey Lung Institute for Translational Science, Northwestern University
| |
Collapse
|
32
|
Koutrouli M, Nastou K, Piera Líndez P, Bouwmeester R, Rasmussen S, Martens L, Jensen LJ. FAVA: high-quality functional association networks inferred from scRNA-seq and proteomics data. Bioinformatics 2024; 40:btae010. [PMID: 38192003 PMCID: PMC10868155 DOI: 10.1093/bioinformatics/btae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Revised: 12/07/2023] [Accepted: 01/05/2024] [Indexed: 01/10/2024] Open
Abstract
MOTIVATION Protein networks are commonly used for understanding how proteins interact. However, they are typically biased by data availability, favoring well-studied proteins with more interactions. To uncover functions of understudied proteins, we must use data that are not affected by this literature bias, such as single-cell RNA-seq and proteomics. Due to data sparseness and redundancy, functional association analysis becomes complex. RESULTS To address this, we have developed FAVA (Functional Associations using Variational Autoencoders), which compresses high-dimensional data into a low-dimensional space. FAVA infers networks from high-dimensional omics data with much higher accuracy than existing methods, across a diverse collection of real as well as simulated datasets. FAVA can process large datasets with over 0.5 million conditions and has predicted 4210 interactions between 1039 understudied proteins. Our findings showcase FAVA's capability to offer novel perspectives on protein interactions. FAVA functions within the scverse ecosystem, employing AnnData as its input source. AVAILABILITY AND IMPLEMENTATION Source code, documentation, and tutorials for FAVA are accessible on GitHub at https://github.com/mikelkou/fava. FAVA can also be installed and used via pip/PyPI as well as via the scverse ecosystem https://github.com/scverse/ecosystem-packages/tree/main/packages/favapy.
Collapse
Affiliation(s)
- Mikaela Koutrouli
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Katerina Nastou
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Pau Piera Líndez
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Robbin Bouwmeester
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| | - Lennart Martens
- VIB-UGent Center for Medical Biotechnology, VIB, 9052 Ghent, Belgium
- Department of Biomolecular Medicine, Ghent University, 9052 Ghent, Belgium
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark
| |
Collapse
|
33
|
Schubach M, Maass T, Nazaretyan L, Röner S, Kircher M. CADD v1.7: using protein language models, regulatory CNNs and other nucleotide-level scores to improve genome-wide variant predictions. Nucleic Acids Res 2024; 52:D1143-D1154. [PMID: 38183205 PMCID: PMC10767851 DOI: 10.1093/nar/gkad989] [Citation(s) in RCA: 118] [Impact Index Per Article: 118.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/14/2023] [Accepted: 10/17/2023] [Indexed: 01/07/2024] Open
Abstract
Machine Learning-based scoring and classification of genetic variants aids the assessment of clinical findings and is employed to prioritize variants in diverse genetic studies and analyses. Combined Annotation-Dependent Depletion (CADD) is one of the first methods for the genome-wide prioritization of variants across different molecular functions and has been continuously developed and improved since its original publication. Here, we present our most recent release, CADD v1.7. We explored and integrated new annotation features, among them state-of-the-art protein language model scores (Meta ESM-1v), regulatory variant effect predictions (from sequence-based convolutional neural networks) and sequence conservation scores (Zoonomia). We evaluated the new version on data sets derived from ClinVar, ExAC/gnomAD and 1000 Genomes variants. For coding effects, we tested CADD on 31 Deep Mutational Scanning (DMS) data sets from ProteinGym and, for regulatory effect prediction, we used saturation mutagenesis reporter assay data of promoter and enhancer sequences. The inclusion of new features further improved the overall performance of CADD. As with previous releases, all data sets, genome-wide CADD v1.7 scores, scripts for on-site scoring and an easy-to-use webserver are readily provided via https://cadd.bihealth.org/ or https://cadd.gs.washington.edu/ to the community.
Collapse
Affiliation(s)
- Max Schubach
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Thorben Maass
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| | - Lusiné Nazaretyan
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Sebastian Röner
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
| | - Martin Kircher
- Exploratory Diagnostic Sciences, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
- Institute of Human Genetics, University Hospital Schleswig-Holstein, University of Lübeck, Lübeck, Germany
| |
Collapse
|
34
|
Rappsilber J. A dive into the unknome. Trends Genet 2024; 40:15-16. [PMID: 37968205 DOI: 10.1016/j.tig.2023.10.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2023] [Accepted: 10/23/2023] [Indexed: 11/17/2023]
Abstract
We may never understand the function of all genes, findings by Freeman, Munro and colleagues suggest, unless we rethink our approaches. They make a thorough attempt at quantifying the unknownness of protein-coding genes and experimentally prove that many neglected genes hold the seed of important discoveries.
Collapse
Affiliation(s)
- Juri Rappsilber
- Technische Universität Berlin, Chair of Bioanalytics, 10623 Berlin, Germany; Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh, EH9 3BF, UK; Si-M/'Der Simulierte Mensch', a Science Framework of Technische Universität Berlin and Charité - Universitätsmedizin Berlin, Berlin, Germany.
| |
Collapse
|
35
|
Kurt Z, Cheng J, Barrere-Cain R, McQuillen CN, Saleem Z, Hsu N, Jiang N, Pan C, Franzén O, Koplev S, Wang S, Björkegren J, Lusis AJ, Blencowe M, Yang X. Shared and distinct pathways and networks genetically linked to coronary artery disease between human and mouse. eLife 2023; 12:RP88266. [PMID: 38060277 PMCID: PMC10703441 DOI: 10.7554/elife.88266] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/08/2023] Open
Abstract
Mouse models have been used extensively to study human coronary artery disease (CAD) or atherosclerosis and to test therapeutic targets. However, whether mouse and human share similar genetic factors and pathogenic mechanisms of atherosclerosis has not been thoroughly investigated in a data-driven manner. We conducted a cross-species comparison study to better understand atherosclerosis pathogenesis between species by leveraging multiomics data. Specifically, we compared genetically driven and thus CAD-causal gene networks and pathways, by using human GWAS of CAD from the CARDIoGRAMplusC4D consortium and mouse GWAS of atherosclerosis from the Hybrid Mouse Diversity Panel (HMDP) followed by integration with functional multiomics human (STARNET and GTEx) and mouse (HMDP) databases. We found that mouse and human shared >75% of CAD causal pathways. Based on network topology, we then predicted key regulatory genes for both the shared pathways and species-specific pathways, which were further validated through the use of single cell data and the latest CAD GWAS. In sum, our results should serve as a much-needed guidance for which human CAD-causal pathways can or cannot be further evaluated for novel CAD therapies using mouse models.
Collapse
Affiliation(s)
- Zeyneb Kurt
- Department of Integrative Biology and Physiology, University of California, Los AngelesLos AngelesUnited States
- The Information School at the University of SheffieldSheffieldUnited Kingdom
| | - Jenny Cheng
- Department of Integrative Biology and Physiology, University of California, Los AngelesLos AngelesUnited States
- Interdepartmental Program of Molecular, Cellular and Integrative Physiology, University of California, Los AngelesLos AngelesUnited States
| | - Rio Barrere-Cain
- Department of Integrative Biology and Physiology, University of California, Los AngelesLos AngelesUnited States
| | - Caden N McQuillen
- Department of Integrative Biology and Physiology, University of California, Los AngelesLos AngelesUnited States
| | - Zara Saleem
- Department of Integrative Biology and Physiology, University of California, Los AngelesLos AngelesUnited States
| | - Neil Hsu
- Department of Integrative Biology and Physiology, University of California, Los AngelesLos AngelesUnited States
| | - Nuoya Jiang
- Department of Integrative Biology and Physiology, University of California, Los AngelesLos AngelesUnited States
| | - Calvin Pan
- Department of Medicine, Division of Cardiology, University of California, Los AngelesLos AngelesUnited States
| | - Oscar Franzén
- Department of Genetics & Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount SinaiNew YorkUnited States
| | - Simon Koplev
- Department of Genetics & Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount SinaiNew YorkUnited States
| | - Susanna Wang
- Department of Integrative Biology and Physiology, University of California, Los AngelesLos AngelesUnited States
| | - Johan Björkegren
- Department of Genetics & Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount SinaiNew YorkUnited States
- Department of Medicine, (Huddinge), Karolinska InstitutetHuddingeSweden
| | - Aldons J Lusis
- Department of Medicine, Division of Cardiology, University of California, Los AngelesLos AngelesUnited States
- Departments of Human Genetics & Microbiology, Immunology, and Molecular Genetics, UCLALos AngelesUnited States
- Cardiovascular Research Laboratory, David Geffen School of Medicine, UCLALos AngelesUnited States
| | - Montgomery Blencowe
- Department of Integrative Biology and Physiology, University of California, Los AngelesLos AngelesUnited States
- Interdepartmental Program of Molecular, Cellular and Integrative Physiology, University of California, Los AngelesLos AngelesUnited States
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California, Los AngelesLos AngelesUnited States
- Interdepartmental Program of Molecular, Cellular and Integrative Physiology, University of California, Los AngelesLos AngelesUnited States
- Interdepartmental Program of Bioinformatics, University of California, Los AngelesLos AngelesUnited States
- Department of Molecular and Medical Pharmacology, University of California, Los AngelesLos AngelesUnited States
| |
Collapse
|
36
|
Gill K, Rajan JRS, Chow E, Ashbrook DG, Williams RW, Zwicker JG, Goldowitz D. Developmental coordination disorder: What can we learn from RI mice using motor learning tasks and QTL analysis. GENES, BRAIN, AND BEHAVIOR 2023; 22:e12859. [PMID: 37553802 PMCID: PMC10733574 DOI: 10.1111/gbb.12859] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 07/13/2023] [Accepted: 07/16/2023] [Indexed: 08/10/2023]
Abstract
Developmental Coordination Disorder (DCD) is a neurodevelopmental disorder of unknown etiology that affects one in 20 children. There is an indication that DCD has an underlying genetic component due to its high heritability. Therefore, we explored the use of a recombinant inbred family of mice known as the BXD panel to understand the genetic basis of complex traits (i.e., motor learning) through identification of quantitative trait loci (QTLs). The overall aim of this study was to utilize the QTL approach to evaluate the genome-to-phenome correlation in BXD strains of mice in order to better understand the human presentation of DCD. Results of this current study confirm differences in motor learning in selected BXD strains and strains with altered cerebellar volume. Five strains - BXD15, BXD27, BXD28, BXD75, and BXD86 - exhibited the most DCD-like phenotype when compared with other BXD strains of interest. Results indicate that BXD15 and BXD75 struggled primarily with gross motor skills, BXD28 primarily had difficulties with fine motor skills, and BXD27 and BXD86 strains struggled with both fine and gross motor skills. The functional roles of genes within significant QTLs were assessed in relation to DCD-like behavior. Only Rab3a (Ras-related protein Rab-3A) emerged as a high likelihood candidate gene for the horizontal ladder rung task. This gene is associated with brain and skeletal muscle development, but lacked nonsynonymous polymorphisms. This study along with Gill et al. (same issue) is the first studies to specifically examine the genetic linkage of DCD using BXD strains of mice.
Collapse
Affiliation(s)
- Kamaldeep Gill
- Rehabilitation Sciences, University of British ColumbiaVancouverBritish ColumbiaCanada
- British Columbia Children's Hospital Research InstituteVancouverBritish ColumbiaCanada
| | - Jeffy Rajan Soundara Rajan
- Department of Medical GeneticsUniversity of British ColumbiaVancouverBritish ColumbiaCanada
- Centre for Molecular Medicine and TherapeuticsDepartment of Medical Genetics, University of British ColumbiaVancouverBritish ColumbiaCanada
| | - Eric Chow
- British Columbia Children's Hospital Research InstituteVancouverBritish ColumbiaCanada
- Centre for Molecular Medicine and TherapeuticsDepartment of Medical Genetics, University of British ColumbiaVancouverBritish ColumbiaCanada
| | - David G. Ashbrook
- Department of GeneticsGenomics and Informatics, University of Tennessee Health Science CenterMemphisTennesseeUSA
| | - Robert W. Williams
- Department of GeneticsGenomics and Informatics, University of Tennessee Health Science CenterMemphisTennesseeUSA
| | - Jill G. Zwicker
- British Columbia Children's Hospital Research InstituteVancouverBritish ColumbiaCanada
- Department of Occupational Science & Occupational TherapyUniversity of British ColumbiaVancouverBritish ColumbiaCanada
- Department of PediatricsUniversity of British ColumbiaVancouverBritish ColumbiaCanada
| | - Daniel Goldowitz
- British Columbia Children's Hospital Research InstituteVancouverBritish ColumbiaCanada
- Department of Medical GeneticsUniversity of British ColumbiaVancouverBritish ColumbiaCanada
- Centre for Molecular Medicine and TherapeuticsDepartment of Medical Genetics, University of British ColumbiaVancouverBritish ColumbiaCanada
| |
Collapse
|
37
|
Allayee H, Farber CR, Seldin MM, Williams EG, James DE, Lusis AJ. Systems genetics approaches for understanding complex traits with relevance for human disease. eLife 2023; 12:e91004. [PMID: 37962168 PMCID: PMC10645424 DOI: 10.7554/elife.91004] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/16/2023] [Indexed: 11/15/2023] Open
Abstract
Quantitative traits are often complex because of the contribution of many loci, with further complexity added by environmental factors. In medical research, systems genetics is a powerful approach for the study of complex traits, as it integrates intermediate phenotypes, such as RNA, protein, and metabolite levels, to understand molecular and physiological phenotypes linking discrete DNA sequence variation to complex clinical and physiological traits. The primary purpose of this review is to describe some of the resources and tools of systems genetics in humans and rodent models, so that researchers in many areas of biology and medicine can make use of the data.
Collapse
Affiliation(s)
- Hooman Allayee
- Departments of Population & Public Health Sciences, University of Southern CaliforniaLos AngelesUnited States
- Biochemistry & Molecular Medicine, Keck School of Medicine, University of Southern CaliforniaLos AngelesUnited States
| | - Charles R Farber
- Center for Public Health Genomics, University of Virginia School of MedicineCharlottesvilleUnited States
- Departments of Biochemistry & Molecular Genetics, University of Virginia School of MedicineCharlottesvilleUnited States
- Public Health Sciences, University of Virginia School of MedicineCharlottesvilleUnited States
| | - Marcus M Seldin
- Department of Biological Chemistry, University of California, IrvineIrvineUnited States
| | - Evan Graehl Williams
- Luxembourg Centre for Systems Biomedicine, University of LuxembourgLuxembourgLuxembourg
| | - David E James
- School of Life and Environmental Sciences, University of SydneyCamperdownAustralia
- Faculty of Medicine and Health, University of SydneyCamperdownAustralia
- Charles Perkins Centre, University of SydneyCamperdownAustralia
| | - Aldons J Lusis
- Departments of Human Genetics, University of California, Los AngelesLos AngelesUnited States
- Medicine, University of California, Los AngelesLos AngelesUnited States
- Microbiology, Immunology, & Molecular Genetics, David Geffen School of Medicine of UCLALos AngelesUnited States
| |
Collapse
|
38
|
Hogan CA, Gratz SJ, Dumouchel JL, Thakur RS, Delgado A, Lentini JM, Madhwani KR, Fu D, O'Connor‐Giles KM. Expanded tRNA methyltransferase family member TRMT9B regulates synaptic growth and function. EMBO Rep 2023; 24:e56808. [PMID: 37642556 PMCID: PMC10561368 DOI: 10.15252/embr.202356808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 08/03/2023] [Accepted: 08/14/2023] [Indexed: 08/31/2023] Open
Abstract
Nervous system function rests on the formation of functional synapses between neurons. We have identified TRMT9B as a new regulator of synapse formation and function in Drosophila. TRMT9B has been studied for its role as a tumor suppressor and is one of two metazoan homologs of yeast tRNA methyltransferase 9 (Trm9), which methylates tRNA wobble uridines. Whereas Trm9 homolog ALKBH8 is ubiquitously expressed, TRMT9B is enriched in the nervous system. However, in the absence of animal models, TRMT9B's role in the nervous system has remained unstudied. Here, we generate null alleles of TRMT9B and find it acts postsynaptically to regulate synaptogenesis and promote neurotransmission. Through liquid chromatography-mass spectrometry, we find that ALKBH8 catalyzes canonical tRNA wobble uridine methylation, raising the question of whether TRMT9B is a methyltransferase. Structural modeling studies suggest TRMT9B retains methyltransferase function and, in vivo, disruption of key methyltransferase residues blocks TRMT9B's ability to rescue synaptic overgrowth, but not neurotransmitter release. These findings reveal distinct roles for TRMT9B in the nervous system and highlight the significance of tRNA methyltransferase family diversification in metazoans.
Collapse
Affiliation(s)
- Caley A Hogan
- Genetics Training ProgramUniversity of Wisconsin‐MadisonMadisonWIUSA
| | - Scott J Gratz
- Department of NeuroscienceBrown UniversityProvidenceRIUSA
| | | | - Rajan S Thakur
- Department of NeuroscienceBrown UniversityProvidenceRIUSA
| | - Ambar Delgado
- Department of NeuroscienceBrown UniversityProvidenceRIUSA
| | - Jenna M Lentini
- Department of Biology, Center for RNA BiologyUniversity of RochesterRochesterNYUSA
| | | | - Dragony Fu
- Department of Biology, Center for RNA BiologyUniversity of RochesterRochesterNYUSA
| | - Kate M O'Connor‐Giles
- Department of NeuroscienceBrown UniversityProvidenceRIUSA
- Carney Institute for Brain ScienceProvidenceRIUSA
| |
Collapse
|
39
|
Rodríguez-López M, Bordin N, Lees J, Scholes H, Hassan S, Saintain Q, Kamrad S, Orengo C, Bähler J. Broad functional profiling of fission yeast proteins using phenomics and machine learning. eLife 2023; 12:RP88229. [PMID: 37787768 PMCID: PMC10547477 DOI: 10.7554/elife.88229] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023] Open
Abstract
Many proteins remain poorly characterized even in well-studied organisms, presenting a bottleneck for research. We applied phenomics and machine-learning approaches with Schizosaccharomyces pombe for broad cues on protein functions. We assayed colony-growth phenotypes to measure the fitness of deletion mutants for 3509 non-essential genes in 131 conditions with different nutrients, drugs, and stresses. These analyses exposed phenotypes for 3492 mutants, including 124 mutants of 'priority unstudied' proteins conserved in humans, providing varied functional clues. For example, over 900 proteins were newly implicated in the resistance to oxidative stress. Phenotype-correlation networks suggested roles for poorly characterized proteins through 'guilt by association' with known proteins. For complementary functional insights, we predicted Gene Ontology (GO) terms using machine learning methods exploiting protein-network and protein-homology data (NET-FF). We obtained 56,594 high-scoring GO predictions, of which 22,060 also featured high information content. Our phenotype-correlation data and NET-FF predictions showed a strong concordance with existing PomBase GO annotations and protein networks, with integrated analyses revealing 1675 novel GO predictions for 783 genes, including 47 predictions for 23 priority unstudied proteins. Experimental validation identified new proteins involved in cellular aging, showing that these predictions and phenomics data provide a rich resource to uncover new protein functions.
Collapse
Affiliation(s)
- María Rodríguez-López
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Nicola Bordin
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jon Lees
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
- University of BristolBristolUnited Kingdom
| | - Harry Scholes
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Shaimaa Hassan
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
- Helwan University, Faculty of PharmacyCairoEgypt
| | - Quentin Saintain
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Stephan Kamrad
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| | - Christine Orengo
- University College London, Institute of Structural and Molecular BiologyLondonUnited Kingdom
| | - Jürg Bähler
- University College London, Institute of Healthy Ageing and Department of Genetics, Evolution & EnvironmentLondonUnited Kingdom
| |
Collapse
|
40
|
Kurt Z, Cheng J, McQuillen CN, Saleem Z, Hsu N, Jiang N, Barrere-Cain R, Pan C, Franzen O, Koplev S, Wang S, Bjorkegren J, Lusis AJ, Blencowe M, Yang X. Shared and distinct pathways and networks genetically linked to coronary artery disease between human and mouse. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.08.544148. [PMID: 37333408 PMCID: PMC10274918 DOI: 10.1101/2023.06.08.544148] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/20/2023]
Abstract
Mouse models have been used extensively to study human coronary artery disease (CAD) or atherosclerosis and to test therapeutic targets. However, whether mouse and human share similar genetic factors and pathogenic mechanisms of atherosclerosis has not been thoroughly investigated in a data-driven manner. We conducted a cross-species comparison study to better understand atherosclerosis pathogenesis between species by leveraging multiomics data. Specifically, we compared genetically driven and thus CAD-causal gene networks and pathways, by using human GWAS of CAD from the CARDIoGRAMplusC4D consortium and mouse GWAS of atherosclerosis from the Hybrid Mouse Diversity Panel (HMDP) followed by integration with functional multiomics human (STARNET and GTEx) and mouse (HMDP) databases. We found that mouse and human shared >75% of CAD causal pathways. Based on network topology, we then predicted key regulatory genes for both the shared pathways and species-specific pathways, which were further validated through the use of single cell data and the latest CAD GWAS. In sum, our results should serve as a much-needed guidance for which human CAD-causal pathways can or cannot be further evaluated for novel CAD therapies using mouse models.
Collapse
Affiliation(s)
- Zeyneb Kurt
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Department of Computer and Information Sciences, University of Northumbria, Ellison Pl, Newcastle upon Tyne NE1 8ST, UK
| | - Jenny Cheng
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Interdepartmental Program of Molecular, Cellular and Integrative Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Caden N. McQuillen
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Zara Saleem
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Neil Hsu
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Nuoya Jiang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Rio Barrere-Cain
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Calvin Pan
- Department of Medicine, Division of Cardiology, University of California, Los Angeles, 650 Charles E Young Drive South, Los Angeles, CA 90095-1679, USA
| | - Oscar Franzen
- Department of Genetics & Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029-6574, US
| | - Simon Koplev
- Department of Genetics & Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029-6574, US
| | - Susanna Wang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Johan Bjorkegren
- Department of Genetics & Genomic Sciences, Institute of Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029-6574, US
- Department of Medicine, (Huddinge), Karolinska Institutet, 141 57 Huddinge, Sweden
| | - Aldons J. Lusis
- Department of Medicine, Division of Cardiology, University of California, Los Angeles, 650 Charles E Young Drive South, Los Angeles, CA 90095-1679, USA
- Departments of Human Genetics & Microbiology, Immunology, and Molecular Genetics, UCLA, CA 90095, USA
- Cardiovascular Research Laboratory, David Geffen School of Medicine, UCLA, CA 90095
| | - Montgomery Blencowe
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Interdepartmental Program of Molecular, Cellular and Integrative Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| | - Xia Yang
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Interdepartmental Program of Molecular, Cellular and Integrative Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Interdepartmental Program of Bioinformatics, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
- Department of Molecular and Medical Pharmacology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, CA 90095, USA
| |
Collapse
|
41
|
Anderson B, Rosston P, Ong HW, Hossain MA, Davis-Gilbert ZW, Drewry DH. How many kinases are druggable? A review of our current understanding. Biochem J 2023; 480:1331-1363. [PMID: 37642371 PMCID: PMC10586788 DOI: 10.1042/bcj20220217] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2023] [Revised: 08/11/2023] [Accepted: 08/15/2023] [Indexed: 08/31/2023]
Abstract
There are over 500 human kinases ranging from very well-studied to almost completely ignored. Kinases are tractable and implicated in many diseases, making them ideal targets for medicinal chemistry campaigns, but is it possible to discover a drug for each individual kinase? For every human kinase, we gathered data on their citation count, availability of chemical probes, approved and investigational drugs, PDB structures, and biochemical and cellular assays. Analysis of these factors highlights which kinase groups have a wealth of information available, and which groups still have room for progress. The data suggest a disproportionate focus on the more well characterized kinases while much of the kinome remains comparatively understudied. It is noteworthy that tool compounds for understudied kinases have already been developed, and there is still untapped potential for further development in this chemical space. Finally, this review discusses many of the different strategies employed to generate selectivity between kinases. Given the large volume of information available and the progress made over the past 20 years when it comes to drugging kinases, we believe it is possible to develop a tool compound for every human kinase. We hope this review will prove to be both a useful resource as well as inspire the discovery of a tool for every kinase.
Collapse
Affiliation(s)
- Brian Anderson
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Peter Rosston
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
- Department of Chemistry, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Han Wee Ong
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Mohammad Anwar Hossain
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - Zachary W. Davis-Gilbert
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| | - David H. Drewry
- Structural Genomics Consortium, UNC Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
- UNC Lineberger Comprehensive Cancer Center, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, U.S.A
| |
Collapse
|
42
|
Tantoso E, Eisenhaber B, Sinha S, Jensen LJ, Eisenhaber F. Did the early full genome sequencing of yeast boost gene function discovery? Biol Direct 2023; 18:46. [PMID: 37574542 PMCID: PMC10424406 DOI: 10.1186/s13062-023-00403-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2023] [Accepted: 08/01/2023] [Indexed: 08/15/2023] Open
Abstract
BACKGROUND Although the genome of Saccharomyces cerevisiae (S. cerevisiae) was the first one of a eukaryote organism that was fully sequenced (in 1996), a complete understanding of the potential of encoded biomolecular mechanisms has not yet been achieved. Here, we wish to quantify how far the goal of a full list of S. cerevisiae gene functions still is. RESULTS The scientific literature about S. cerevisiae protein-coding genes has been mapped onto the yeast genome via the mentioning of names for genomic regions in scientific publications. The match was quantified with the ratio of a given gene name's occurrences to those of any gene names in the article. We find that ~ 230 elite genes with ≥ 75 full publication equivalents (FPEs, FPE = 1 is an idealized publication referring to just a single gene) command ~ 45% of all literature. At the same time, about two thirds of the genes (each with less than 10 FPEs) are described in just 12% of the literature (in average each such gene has just ~ 1.5% of the literature of an elite gene). About 600 genes have not been mentioned in any dedicated article. Compared with other groups of genes, the literature growth rates were highest for uncharacterized or understudied genes until late nineties of the twentieth century. Yet, these growth rates deteriorated and became negative thereafter. Thus, yeast function discovery for previously uncharacterized genes has returned to the level of ~ 1980. At the same time, literature for anyhow well-studied genes (with a threshold T10 (≥ 10 FPEs) and higher) remains steadily growing. CONCLUSIONS Did the early full genome sequencing of yeast boost gene function discovery? The data proves that the moment of publishing the full genome in reality coincides with the onset of decline of gene function discovery for previously uncharacterized genes. If the current status of literature about yeast molecular mechanisms can be extrapolated into the future, it will take about another ~ 50 years to complete the yeast gene function list. We found that a small group of scientific journals contributed extraordinarily to publishing early reports relevant to yeast gene function discoveries.
Collapse
Affiliation(s)
- Erwin Tantoso
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
| | - Birgit Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
- LASA - Lausitz Advanced Scientific Applications gGmbH, Straße Der Einheit 2-24, 02943, Weißwasser, Federal Republic of Germany.
| | - Swati Sinha
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Lars Juhl Jensen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Frank Eisenhaber
- Agency for Science, Technology and Research (A*STAR), Bioinformatics Institute (BII), 30 Biopolis Street #07-01, Matrix Building, Singapore, 138671, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), 60 Biopolis Street, Singapore, 138672, Republic of Singapore.
- LASA - Lausitz Advanced Scientific Applications gGmbH, Straße Der Einheit 2-24, 02943, Weißwasser, Federal Republic of Germany.
- School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Republic of Singapore.
| |
Collapse
|
43
|
Potter A, Hangas A, Goffart S, Huynen MA, Cabrera-Orefice A, Spelbrink JN. Uncharacterized protein C17orf80 - a novel interactor of human mitochondrial nucleoids. J Cell Sci 2023; 136:jcs260822. [PMID: 37401363 PMCID: PMC10445727 DOI: 10.1242/jcs.260822] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 06/26/2023] [Indexed: 07/05/2023] Open
Abstract
Molecular functions of many human proteins remain unstudied, despite the demonstrated association with diseases or pivotal molecular structures, such as mitochondrial DNA (mtDNA). This small genome is crucial for the proper functioning of mitochondria, the energy-converting organelles. In mammals, mtDNA is arranged into macromolecular complexes called nucleoids that serve as functional stations for its maintenance and expression. Here, we aimed to explore an uncharacterized protein C17orf80, which was previously detected close to the nucleoid components by proximity labelling mass spectrometry. To investigate the subcellular localization and function of C17orf80, we took advantage of immunofluorescence microscopy, interaction proteomics and several biochemical assays. We demonstrate that C17orf80 is a mitochondrial membrane-associated protein that interacts with nucleoids even when mtDNA replication is inhibited. In addition, we show that C17orf80 is not essential for mtDNA maintenance and mitochondrial gene expression in cultured human cells. These results provide a basis for uncovering the molecular function of C17orf80 and the nature of its association with nucleoids, possibly leading to new insights about mtDNA and its expression.
Collapse
Affiliation(s)
- Alisa Potter
- Department of Pediatrics, Amalia Children's Hospital, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
- Radboud Center for Mitochondrial Medicine (RCMM), Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| | - Anu Hangas
- Department of Environmental and Biological Sciences, University of Eastern Finland, Joensuu, 80101, Finland
| | - Steffi Goffart
- Department of Environmental and Biological Sciences, University of Eastern Finland, Joensuu, 80101, Finland
| | - Martijn A. Huynen
- Department of Medical BioSciences, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| | - Alfredo Cabrera-Orefice
- Radboud Center for Mitochondrial Medicine (RCMM), Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
- Department of Medical BioSciences, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| | - Johannes N. Spelbrink
- Department of Pediatrics, Amalia Children's Hospital, Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
- Radboud Center for Mitochondrial Medicine (RCMM), Radboud University Medical Center, Nijmegen, 6525 GA, The Netherlands
| |
Collapse
|
44
|
Rocha JJ, Jayaram SA, Stevens TJ, Muschalik N, Shah RD, Emran S, Robles C, Freeman M, Munro S. Functional unknomics: Systematic screening of conserved genes of unknown function. PLoS Biol 2023; 21:e3002222. [PMID: 37552676 PMCID: PMC10409296 DOI: 10.1371/journal.pbio.3002222] [Citation(s) in RCA: 28] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 06/27/2023] [Indexed: 08/10/2023] Open
Abstract
The human genome encodes approximately 20,000 proteins, many still uncharacterised. It has become clear that scientific research tends to focus on well-studied proteins, leading to a concern that poorly understood genes are unjustifiably neglected. To address this, we have developed a publicly available and customisable "Unknome database" that ranks proteins based on how little is known about them. We applied RNA interference (RNAi) in Drosophila to 260 unknown genes that are conserved between flies and humans. Knockdown of some genes resulted in loss of viability, and functional screening of the rest revealed hits for fertility, development, locomotion, protein quality control, and resilience to stress. CRISPR/Cas9 gene disruption validated a component of Notch signalling and 2 genes contributing to male fertility. Our work illustrates the importance of poorly understood genes, provides a resource to accelerate future research, and highlights a need to support database curation to ensure that misannotation does not erode our awareness of our own ignorance.
Collapse
Affiliation(s)
- João J. Rocha
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | | | - Tim J. Stevens
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | | | - Rajen D. Shah
- Centre for Mathematical Sciences, University of Cambridge, Cambridge, United Kingdom
| | - Sahar Emran
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Cristina Robles
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| | - Matthew Freeman
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
- Sir William Dunn School of Pathology, University of Oxford, Oxford, United Kingdom
| | - Sean Munro
- MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| |
Collapse
|
45
|
Kratz A, Kim M, Kelly MR, Zheng F, Koczor CA, Li J, Ono K, Qin Y, Churas C, Chen J, Pillich RT, Park J, Modak M, Collier R, Licon K, Pratt D, Sobol RW, Krogan NJ, Ideker T. A multi-scale map of protein assemblies in the DNA damage response. Cell Syst 2023; 14:447-463.e8. [PMID: 37220749 PMCID: PMC10330685 DOI: 10.1016/j.cels.2023.04.007] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 01/30/2023] [Accepted: 04/25/2023] [Indexed: 05/25/2023]
Abstract
The DNA damage response (DDR) ensures error-free DNA replication and transcription and is disrupted in numerous diseases. An ongoing challenge is to determine the proteins orchestrating DDR and their organization into complexes, including constitutive interactions and those responding to genomic insult. Here, we use multi-conditional network analysis to systematically map DDR assemblies at multiple scales. Affinity purifications of 21 DDR proteins, with/without genotoxin exposure, are combined with multi-omics data to reveal a hierarchical organization of 605 proteins into 109 assemblies. The map captures canonical repair mechanisms and proposes new DDR-associated proteins extending to stress, transport, and chromatin functions. We find that protein assemblies closely align with genetic dependencies in processing specific genotoxins and that proteins in multiple assemblies typically act in multiple genotoxin responses. Follow-up by DDR functional readouts newly implicates 12 assembly members in double-strand-break repair. The DNA damage response assemblies map is available for interactive visualization and query (ccmi.org/ddram/).
Collapse
Affiliation(s)
- Anton Kratz
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA
| | - Minkyu Kim
- University of California San Francisco, Department of Cellular and Molecular Pharmacology, San Francisco, CA 94158, USA; The J. David Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA; Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA; University of Texas Health Science Center San Antonio, Department of Biochemistry and Structural Biology, San Antonio, TX 78229, USA
| | - Marcus R Kelly
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Fan Zheng
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA
| | - Christopher A Koczor
- University of South Alabama, Department of Pharmacology and Mitchell Cancer Institute, Mobile, AL 36604, USA
| | - Jianfeng Li
- University of South Alabama, Department of Pharmacology and Mitchell Cancer Institute, Mobile, AL 36604, USA
| | - Keiichiro Ono
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Yue Qin
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Christopher Churas
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Jing Chen
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Rudolf T Pillich
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Jisoo Park
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA
| | - Maya Modak
- University of California San Francisco, Department of Cellular and Molecular Pharmacology, San Francisco, CA 94158, USA; The J. David Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA; Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA
| | - Rachel Collier
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Kate Licon
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Dexter Pratt
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA
| | - Robert W Sobol
- University of South Alabama, Department of Pharmacology and Mitchell Cancer Institute, Mobile, AL 36604, USA; Brown University, Department of Pathology and Laboratory Medicine and Legorreta Cancer Center, Providence, RI 02903, USA.
| | - Nevan J Krogan
- University of California San Francisco, Department of Cellular and Molecular Pharmacology, San Francisco, CA 94158, USA; The J. David Gladstone Institute of Data Science and Biotechnology, San Francisco, CA 94158, USA; Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA 94158, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA.
| | - Trey Ideker
- University of California San Diego, Department of Medicine, San Diego, CA 92093, USA; The Cancer Cell Map Initiative, San Francisco and La Jolla, CA, USA.
| |
Collapse
|
46
|
Elsamad G, Mecawi AS, Pauža AG, Gillard B, Paterson A, Duque VJ, Šarenac O, Žigon NJ, Greenwood M, Greenwood MP, Murphy D. Ageing restructures the transcriptome of the hypothalamic supraoptic nucleus and alters the response to dehydration. NPJ AGING 2023; 9:12. [PMID: 37264028 PMCID: PMC10234251 DOI: 10.1038/s41514-023-00108-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 05/04/2023] [Indexed: 06/03/2023]
Abstract
Ageing is associated with altered neuroendocrine function. In the context of the hypothalamic supraoptic nucleus, which makes the antidiuretic hormone vasopressin, ageing alters acute responses to hyperosmotic cues, rendering the elderly more susceptible to dehydration. Chronically, vasopressin has been associated with numerous diseases of old age, including type 2 diabetes and metabolic syndrome. Bulk RNAseq transcriptome analysis has been used to catalogue the polyadenylated supraoptic nucleus transcriptomes of adult (3 months) and aged (18 months) rats in basal euhydrated and stimulated dehydrated conditions. Gene ontology and Weighted Correlation Network Analysis revealed that ageing is associated with alterations in the expression of extracellular matrix genes. Interestingly, whilst the transcriptomic response to dehydration is overall blunted in aged animals compared to adults, there is a specific enrichment of differentially expressed genes related to neurodegenerative processes in the aged cohort, suggesting that dehydration itself may provoke degenerative consequences in aged rats.
Collapse
Affiliation(s)
- Ghadir Elsamad
- Molecular Neuroendocrinology Research Group, Bristol Medical School: Translational Health Sciences, Dorothy Hodgkin Building, University of Bristol, Bristol, England
| | - André Souza Mecawi
- Laboratory of Molecular Neuroendocrinology, Department of Biophysics, Paulista School of Medicine, Federal University of São Paulo, São Paulo, Brazil
| | - Audrys G Pauža
- Molecular Neuroendocrinology Research Group, Bristol Medical School: Translational Health Sciences, Dorothy Hodgkin Building, University of Bristol, Bristol, England
- Translational Cardio-Respiratory Research Group, Department of Physiology, Faculty of Medical and Health Sciences, University of Auckland, Auckland, New Zealand
| | - Benjamin Gillard
- Molecular Neuroendocrinology Research Group, Bristol Medical School: Translational Health Sciences, Dorothy Hodgkin Building, University of Bristol, Bristol, England
| | - Alex Paterson
- Molecular Neuroendocrinology Research Group, Bristol Medical School: Translational Health Sciences, Dorothy Hodgkin Building, University of Bristol, Bristol, England
- Insilico Consulting Ltd., Wapping Wharf, Bristol, England
| | - Victor J Duque
- Laboratory of Molecular Neuroendocrinology, Department of Biophysics, Paulista School of Medicine, Federal University of São Paulo, São Paulo, Brazil
| | - Olivera Šarenac
- Institute of Pharmacology, Clinical Pharmacology and Toxicology, Faculty of Medicine, University of Belgrade, Belgrade, Serbia
- Department of Safety Pharmacology, Abbvie, North Chicago, Illinois, USA
| | - Nina Japundžić Žigon
- Institute of Pharmacology, Clinical Pharmacology and Toxicology, Faculty of Medicine, University of Belgrade, Belgrade, Serbia
| | - Mingkwan Greenwood
- Molecular Neuroendocrinology Research Group, Bristol Medical School: Translational Health Sciences, Dorothy Hodgkin Building, University of Bristol, Bristol, England
| | - Michael P Greenwood
- Molecular Neuroendocrinology Research Group, Bristol Medical School: Translational Health Sciences, Dorothy Hodgkin Building, University of Bristol, Bristol, England
| | - David Murphy
- Molecular Neuroendocrinology Research Group, Bristol Medical School: Translational Health Sciences, Dorothy Hodgkin Building, University of Bristol, Bristol, England.
| |
Collapse
|
47
|
Muraleedharan A, Vanderperre B. The endo-lysosomal system in Parkinson's disease: expanding the horizon. J Mol Biol 2023:168140. [PMID: 37148997 DOI: 10.1016/j.jmb.2023.168140] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2023] [Revised: 04/22/2023] [Accepted: 04/27/2023] [Indexed: 05/08/2023]
Abstract
Parkinson's disease (PD) is the second most common neurodegenerative disorder after Alzheimer's disease, and its prevalence is increasing with age. A wealth of genetic evidence indicates that the endo-lysosomal system is a major pathway driving PD pathogenesis with a growing number of genes encoding endo-lysosomal proteins identified as risk factors for PD, making it a promising target for therapeutic intervention. However, detailed knowledge and understanding of the molecular mechanisms linking these genes to the disease are available for only a handful of them (e.g. LRRK2, GBA1, VPS35). Taking on the challenge of studying poorly characterized genes and proteins can be daunting, due to the limited availability of tools and knowledge from previous literature. This review aims at providing a valuable source of molecular and cellular insights into the biology of lesser-studied PD-linked endo-lysosomal genes, to help and encourage researchers in filling the knowledge gap around these less popular genetic players. Specific endo-lysosomal pathways discussed range from endocytosis, sorting, and vesicular trafficking to the regulation of membrane lipids of these membrane-bound organelles and the specific enzymatic activities they contain. We also provide perspectives on future challenges that the community needs to tackle and propose approaches to move forward in our understanding of these poorly studied endo-lysosomal genes. This will help harness their potential in designing innovative and efficient treatments to ultimately re-establish neuronal homeostasis in PD but also other diseases involving endo-lysosomal dysfunction.
Collapse
Affiliation(s)
- Amitha Muraleedharan
- Centre d'Excellence en Recherche sur les Maladies Orphelines - Fondation Courtois and Biological Sciences Department, Université du Québec à Montréal
| | - Benoît Vanderperre
- Centre d'Excellence en Recherche sur les Maladies Orphelines - Fondation Courtois and Biological Sciences Department, Université du Québec à Montréal
| |
Collapse
|
48
|
Sadegh S, Skelton J, Anastasi E, Maier A, Adamowicz K, Möller A, Kriege NM, Kronberg J, Haller T, Kacprowski T, Wipat A, Baumbach J, Blumenthal DB. Lacking mechanistic disease definitions and corresponding association data hamper progress in network medicine and beyond. Nat Commun 2023; 14:1662. [PMID: 36966134 PMCID: PMC10039912 DOI: 10.1038/s41467-023-37349-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 03/13/2023] [Indexed: 03/27/2023] Open
Abstract
A long-term objective of network medicine is to replace our current, mainly phenotype-based disease definitions by subtypes of health conditions corresponding to distinct pathomechanisms. For this, molecular and health data are modeled as networks and are mined for pathomechanisms. However, many such studies rely on large-scale disease association data where diseases are annotated using the very phenotype-based disease definitions the network medicine field aims to overcome. This raises the question to which extent the biases mechanistically inadequate disease annotations introduce in disease association data distort the results of studies which use such data for pathomechanism mining. We address this question using global- and local-scale analyses of networks constructed from disease association data of various types. Our results indicate that large-scale disease association data should be used with care for pathomechanism mining and that analyses of such data should be accompanied by close-up analyses of molecular data for well-characterized patient cohorts.
Collapse
Affiliation(s)
- Sepideh Sadegh
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, Munich, Germany
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - James Skelton
- School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - Elisa Anastasi
- School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - Andreas Maier
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Klaudia Adamowicz
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Anna Möller
- Biomedical Network Science Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany
| | - Nils M Kriege
- Faculty of Computer Science, University of Vienna, Vienna, Austria
- Research Network Data Science, University of Vienna, Vienna, Austria
| | - Jaanika Kronberg
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Toomas Haller
- Estonian Genome Centre, Institute of Genomics, University of Tartu, Tartu, Estonia
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
| | - Anil Wipat
- School of Computing, Newcastle University, Newcastle upon Tyne, UK
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Computational Biomedicine Lab, Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - David B Blumenthal
- Biomedical Network Science Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
| |
Collapse
|
49
|
Zhu M, Tang M, Du Y. Identification of TAC1 Associated with Alzheimer's Disease Using a Robust Rank Aggregation Approach. J Alzheimers Dis 2023; 91:1339-1349. [PMID: 36617784 DOI: 10.3233/jad-220950] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
BACKGROUND Alzheimer's disease (AD) brings heavy burden to society and family. There is an urgent need to find effective methods for disease diagnosis and treatment. The robust rank aggregation (RRA) approach that could aggregate the resulting gene lists has been widely utilized in genomic data analysis. OBJECTIVE To identify hub genes using RRA approach in AD. METHODS Seven microarray datasets in frontal cortex from GEO database were used to identify differential expressed genes (DEGs) in AD patients using RRA approach. STRING was performed to explore the protein-to-protein interaction (PPI). Gene Ontology enrichment and Kyoto Encyclopedia of Genes and Genomes pathway analyses were utilized for enrichment analysis. Human Gene Connectome and Gene Set Enrichment Analysis were used for functional annotation. Finally, the expression levels of hub genes were validated in the cortex of 5xFAD mice by quantitative real-time polymerase chain reaction. RESULTS After RRA analysis, 473 DEGs (216 upregulated and 257 downregulated) were identified in AD samples. PPI showed that DEGs had a total of 416 nodes and 2750 edges. These genes were divided into 17 clusters, each of which contains at least three genes. After functional annotation and enrichment analysis, TAC1 is identified as the hub gene and may be related to synaptic function and inflammation. In addition, Tac1 was found downregulated in cortices of 5xFAD mice. CONCLUSION In the current study, TAC1 is identified as a key gene in the frontal cortex of AD, providing insight into the possible pathogenesis and potential therapeutic targets for this disease.
Collapse
Affiliation(s)
- Min Zhu
- Department of Neurology, Shandong Provincial Hospital, Shandong University, Jinan, Shandong, People's Republic of China.,Department of Neurology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, Shandong, People's Republic of China
| | - Minglu Tang
- Department of Neurology, Shandong Provincial Hospital, Shandong University, Jinan, Shandong, People's Republic of China.,Department of Neurology (Cognitive sleep ward), Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, Shandong, People's Republic of China
| | - Yifeng Du
- Department of Neurology, Shandong Provincial Hospital, Shandong University, Jinan, Shandong, People's Republic of China.,Department of Neurology, Shandong Provincial Hospital Affiliated to Shandong First Medical University, Jinan, Shandong, People's Republic of China
| |
Collapse
|
50
|
Franchini L, Orlandi C. Probing the orphan receptors: Tools and directions. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2023; 195:47-76. [PMID: 36707155 DOI: 10.1016/bs.pmbts.2022.06.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
The endogenous ligands activating a large fraction of the G Protein Coupled Receptor (GPCR) family members have yet to be identified. These receptors are commonly labeled as orphans (oGPCRs), and because of the absence of available pharmacological tools they are currently understudied. Nonetheless, genome wide association studies, together with research using animal models identified many physiological functions regulated by oGPCRs. Similarly, mutations in some oGPCRs have been associated with rare genetic disorders or with an increased risk of developing pathologies. The once underestimated pharmacological potential of targeting oGPCRs is increasingly being exploited by the development of novel tools to understand their biology and by drug discovery endeavors aimed at identifying new modulators of their activity. Here, we summarize recent advancements in the field of oGPCRs and future directions.
Collapse
Affiliation(s)
- Luca Franchini
- Department of Pharmacology and Physiology, University of Rochester Medical Center, Rochester, NY, United States
| | - Cesare Orlandi
- Department of Pharmacology and Physiology, University of Rochester Medical Center, Rochester, NY, United States.
| |
Collapse
|