1
|
Serrano-Quílez J, Rodriguez-Navarro S. Unraveling gene expression: a beginner's guide from chromatin modifications to mRNA export in Saccharomyces cerevisiae. Nucleus 2025; 16:2516909. [PMID: 40509867 PMCID: PMC12169046 DOI: 10.1080/19491034.2025.2516909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2025] [Revised: 05/29/2025] [Accepted: 06/03/2025] [Indexed: 06/18/2025] Open
Abstract
Understanding gene expression requires grasping its multi-step processes, from chromatin remodeling to mRNA export. This manuscript provides an accessible entry point for PhD students and junior postdocs beginning research in this area, using yeast as a model organism. We present a beginner-friendly overview of gene expression, emphasizing the dynamic interplay between chromatin modifications, transcription, mRNA processing, and export. Key topics include chromatin organization, with a focus on H2B ubiquitylation and H3 methylation crosstalk; transcriptional control by RNA polymerase II, including initiation, elongation, and termination; and the export of mRNAs via Mex67-Mtr2, adaptor proteins, and the TREX and TREX-2 complexes at the nuclear pore complex. Relevant examples from yeast genetics, biochemistry, and structural biology illustrate each step. This overview aims to equip new researchers with foundational knowledge and provides references to key studies, current challenges, and open questions in the regulation of gene expression.
Collapse
|
2
|
Dong J, Mao Z, Li H, Wang R, Wang Y, Jia H, Li J, Liu Q, Zhang C, Liao X, Liu D, Ma H, Tian C. MTD: A cloud-based omics database and interactive platform for Myceliophthora thermophila. Synth Syst Biotechnol 2025; 10:783-793. [PMID: 40276250 PMCID: PMC12018684 DOI: 10.1016/j.synbio.2025.04.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2025] [Revised: 03/05/2025] [Accepted: 04/02/2025] [Indexed: 04/26/2025] Open
Abstract
Nowadays, biological databases are playing an increasingly critical role in biological research. Myceliophthora thermophila is an excellent thermophilic fungal chassis for industrial enzyme production and plant biomass-based chemical synthesis. The lack of a dedicated public database has made access to and reanalysis of M. thermophila data difficult. To bridge this gap, we developed MTD (https://mtd.biodesign.ac.cn/), a cloud-based omics database and interactive platform for M. thermophila. MTD integrates comprehensive genome annotations, sequence-based predictions, transcriptome data, curated experimental descriptions, and bioinformatics analysis tools, offering a comprehensive, one-stop solution with a 'top-down' search strategy to streamline M. thermophila research. The platform supports data reproduction, rapid querying, and in-depth mining of existing transcriptome datasets. Based on analyses using data and tools in MTD, we identified shifts in metabolic allocation in a glucoamylase hyperproduction strain of M. thermophila, highlighting changes in fatty acid biosynthesis and amino acids biosynthesis pathways, which provide new insights into the underlying phenotypic alterations. As a pioneering resource, MTD marks a key advancement in M. thermophila research and sets the model for developing similar databases for other species.
Collapse
Affiliation(s)
- Jiacheng Dong
- State Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- Haihe Laboratory of Synthetic Biology, Tianjin, 300308, China
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, 300457, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Zhitao Mao
- Biodesign Center, State Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Haoran Li
- Biodesign Center, State Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Ruoyu Wang
- Biodesign Center, State Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Yutao Wang
- State Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, 300457, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Haokai Jia
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
- College of Biological Sciences, China Agricultural University, Beijing, 100193, China
| | - Jingen Li
- State Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Qian Liu
- State Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Chenglin Zhang
- College of Biotechnology, Tianjin University of Science & Technology, Tianjin, 300457, China
| | - Xiaoping Liao
- Biodesign Center, State Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Defei Liu
- State Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Hongwu Ma
- Biodesign Center, State Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Chaoguang Tian
- State Key Laboratory of Engineering Biology for Low-Carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| |
Collapse
|
3
|
Billmyre RB, Craig CJ, Lyon JW, Reichardt C, Kuhn AM, Eickbush MT, Zanders SE. Landscape of essential growth and fluconazole-resistance genes in the human fungal pathogen Cryptococcus neoformans. PLoS Biol 2025; 23:e3003184. [PMID: 40402997 DOI: 10.1371/journal.pbio.3003184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 04/29/2025] [Indexed: 05/24/2025] Open
Abstract
Fungi can cause devastating invasive infections, typically in immunocompromised patients. Treatment is complicated both by the evolutionary similarity between humans and fungi and by the frequent emergence of drug resistance. Studies in fungal pathogens have long been slowed by a lack of high-throughput tools and community resources that are common in model organisms. Here we demonstrate a high-throughput transposon mutagenesis and sequencing (TN-seq) system in Cryptococcus neoformans that enables genome-wide determination of gene essentiality. We employed a random forest machine learning approach to classify the C. neoformans genome as essential or nonessential, predicting 1,465 essential genes, including 302 that lack human orthologs. These genes are ideal targets for new antifungal drug development. TN-seq also enables genome-wide measurement of the fitness contribution of genes to phenotypes of interest. As proof of principle, we demonstrate the genome-wide contribution of genes to growth in fluconazole, a clinically used antifungal. We show a novel role for the well-studied RIM101 pathway in fluconazole susceptibility. We also show that insertions of transposons into the 5' upstream region can drive sensitization of essential genes, enabling screenlike assays of both essential and nonessential components of the genome. Using this approach, we demonstrate a role for mitochondrial function in fluconazole sensitivity, such that tuning down many essential mitochondrial genes via 5' insertions can drive resistance to fluconazole. Our assay system will be valuable in future studies of C. neoformans, particularly in examining the consequences of genotypic diversity.
Collapse
Affiliation(s)
- R Blake Billmyre
- Department of Genetics, Franklin College of Arts and Sciences, University of Georgia, GeorgiaUnited States of America
- Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GeorgiaUnited States of America
- Department of Microbiology, Franklin College of Arts and Sciences, University of Georgia, Athens, GeorgiaUnited States of America
- Stowers Institute for Medical Research, Kansas City, Missouri, United States of America
| | - Caroline J Craig
- Stowers Institute for Medical Research, Kansas City, Missouri, United States of America
| | - Joshua W Lyon
- Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GeorgiaUnited States of America
- Department of Pharmaceutical and Biological Sciences, College of Pharmacy, University of Georgia, Athens, GeorgiaUnited States of America
| | - Claire Reichardt
- Department of Genetics, Franklin College of Arts and Sciences, University of Georgia, GeorgiaUnited States of America
- Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GeorgiaUnited States of America
- Department of Microbiology, Franklin College of Arts and Sciences, University of Georgia, Athens, GeorgiaUnited States of America
| | - Amy M Kuhn
- Department of Genetics, Franklin College of Arts and Sciences, University of Georgia, GeorgiaUnited States of America
- Department of Infectious Diseases, College of Veterinary Medicine, University of Georgia, Athens, GeorgiaUnited States of America
| | - Michael T Eickbush
- Stowers Institute for Medical Research, Kansas City, Missouri, United States of America
| | - Sarah E Zanders
- Stowers Institute for Medical Research, Kansas City, Missouri, United States of America
- Department of Cell Biology and Physiology, University of Kansas Medical Center, Kansas City, Kansas, United States of America
| |
Collapse
|
4
|
Pawłowski PH, Zielenkiewicz P. Predicting the S. cerevisiae Gene Expression Score by a Machine Learning Classifier. Life (Basel) 2025; 15:723. [PMID: 40430151 PMCID: PMC12113619 DOI: 10.3390/life15050723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2025] [Revised: 04/27/2025] [Accepted: 04/28/2025] [Indexed: 05/29/2025] Open
Abstract
The topic of this work is gene expression and its score according to various factors analyzed globally using machine learning techniques. The expression score (ES) of genes characterizes their activity and, thus, their importance for cellular processes. This may depend on many different factors (attributes). To find the most important classifier, a machine learning classifier (random forest) was selected, trained, and optimized on the Waikato Environment for Knowledge Analysis WEKA platform, resulting in the most accurate attribute-dependent prediction of the ES of Saccharomyces cerevisiae genes. In this way, data from the Saccharomyces Genome Database (SGD), presenting ES values corresponding to a wide spectrum of attributes, were used, revised, classified, and balanced, and the significance of the considered attributes was evaluated. In this way, the novel random forest model indicates the most important attributes determining classes of low, moderate, and high ES. They cover both the experimental conditions and the genetic, physical, statistical, and logistic features. During validation, the obtained model could classify the instances of a primary unknown test set with a correctness of 84.1%.
Collapse
Affiliation(s)
- Piotr H. Pawłowski
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, 02-093 Warsaw, Poland;
| | - Piotr Zielenkiewicz
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, 02-093 Warsaw, Poland;
- Laboratory of Systems Biology, Institute of Experimental Plant Biology and Biotechnology, Faculty of Biology, University of Warsaw, 02-096 Warsaw, Poland
| |
Collapse
|
5
|
Asim MN, Asif T, Mehmood F, Dengel A. Peptide classification landscape: An in-depth systematic literature review on peptide types, databases, datasets, predictors architectures and performance. Comput Biol Med 2025; 188:109821. [PMID: 39987697 DOI: 10.1016/j.compbiomed.2025.109821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 02/03/2025] [Accepted: 02/05/2025] [Indexed: 02/25/2025]
Abstract
Peptides are gaining significant attention in diverse fields such as the pharmaceutical market has seen a steady rise in peptide-based therapeutics over the past six decades. Peptides have been utilized in the development of distinct applications including inhibitors of SARS-COV-2 and treatments for conditions like cancer and diabetes. Distinct types of peptides possess unique characteristics, and development of peptide-specific applications require the discrimination of one peptide type from others. To the best of our knowledge, approximately 230 Artificial Intelligence (AI) driven applications have been developed for 22 distinct types of peptides, yet there remains significant room for development of new predictors. A Comprehensive review addresses the critical gap by providing a consolidated platform for the development of AI-driven peptide classification applications. This paper offers several key contributions, including presenting the biological foundations of 22 unique peptide types and categorizes them into four main classes: Regulatory, Therapeutic, Nutritional, and Delivery Peptides. It offers an in-depth overview of 47 databases that have been used to develop peptide classification benchmark datasets. It summarizes details of 288 benchmark datasets that are used in development of diverse types AI-driven peptide classification applications. It provides a detailed summary of 197 sequence representation learning methods and 94 classifiers that have been used to develop 230 distinct AI-driven peptide classification applications. Across 22 distinct types peptide classification tasks related to 288 benchmark datasets, it demonstrates performance values of 230 AI-driven peptide classification applications. It summarizes experimental settings and various evaluation measures that have been employed to assess the performance of AI-driven peptide classification applications. The primary focus of this manuscript is to consolidate scattered information into a single comprehensive platform. This resource will greatly assist researchers who are interested in developing new AI-driven peptide classification applications.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany.
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Faiza Mehmood
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Institute of Data Sciences, University of Engineering and Technology, Lahore, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| |
Collapse
|
6
|
Li Y, Zhang J. Transcriptomic and proteomic effects of gene deletion are not evolutionarily conserved. Genome Res 2025; 35:512-521. [PMID: 39965933 PMCID: PMC11960704 DOI: 10.1101/gr.280008.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 02/07/2025] [Indexed: 02/20/2025]
Abstract
Although the textbook definition of gene function is the effect for which the gene was selected and/or by which it is maintained, gene function is commonly inferred from the phenotypic effects of deleting the gene. Because some of the deletion effects are byproducts of other effects, they may not reflect the gene's selected-effect function. To evaluate the degree to which the phenotypic effects of gene deletion inform gene function, we compare the transcriptomic and proteomic effects of systematic gene deletions in budding yeast (Saccharomyces cerevisiae) with those effects in fission yeast (Schizosaccharomyces pombe). Despite evidence for functional conservation of orthologous genes, their deletions result in no more sharing of transcriptomic or proteomic effects than that from deleting nonorthologous genes. Because the wild-type mRNA and protein levels of orthologous genes are significantly correlated between the two yeasts and because transcriptomic effects of deleting the same gene strongly overlap between studies in the same S. cerevisiae strain by different laboratories, our observation cannot be explained by rapid evolution or large measurement error of gene expression. Analysis of transcriptomic and proteomic effects of gene deletions in multiple S. cerevisiae strains by the same laboratory reveals a high sensitivity of these effects to the genetic background, explaining why these effects are not evolutionarily conserved. Together, our results suggest that most transcriptomic and proteomic effects of gene deletion do not inform selected-effect function. This finding has important implications for assessing and/or understanding gene function, pleiotropy, and biological complexity.
Collapse
Affiliation(s)
- Yang Li
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
7
|
Schreiber G, Rueda F, Renner F, Polat AF, Lorenz P, Klipp E. Expression Dynamics and Genetic Compensation of Cell Cycle Paralogues in Saccharomyces cerevisiae. Cells 2025; 14:412. [PMID: 40136661 PMCID: PMC11941160 DOI: 10.3390/cells14060412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2025] [Revised: 03/04/2025] [Accepted: 03/06/2025] [Indexed: 03/27/2025] Open
Abstract
Cell cycle progression of the yeast Saccharomyces cerevisiae is largely driven by the expression of cyclins, which in turn bind the cyclin-dependent kinase CDK1 providing specificity. Due to the duplication of the yeast genome during evolution, most of the cyclins are present as a pair of paralogues, which are considered to have similar functions and periods of expression. Here, we use single molecule inexpensive fluorescence in situ hybridization (smiFISH) to measure the expression of five pairs of paralogous genes relevant for cell cycle progression (CLN1/CLN2, CLB5/CLB6, CLB3/CLB4, CLB1/CLB2 and ACE2/SWI5) in a large number of unsynchronized single cells representing all cell cycle phases. We systematically compare their expression patterns and strengths. In addition, we also analyze the effect of the knockout of one part of each pair on the expression of the other gene. In order to classify cells into specific cell cycle phases, we developed a convolutional neural network (CNN). We find that the expression levels of some cell-cycle related paralogues differ in their correlation, with CLN1 and CLN2 showing strong correlation and CLB3 and CLB4 showing weakest correlation. The temporal profiles of some pairs also differ. Upon deletion of their paralogue, CLB1 and CLB2 seem to compensate for the expression of the other gene, while this was not observed for ACE2/SWI5. Interestingly, CLB1 and CLB2 also seem to share work between mother and bud in the G2 phase, where CLB2 is primarily expressed in the bud and CLB1 in the mother. Taken together, our results suggest that paralogues related to yeast cell cycle progression should not be considered as the same but differ both in their expression strength and timing as well in their precise role in cell cycle regulation.
Collapse
Affiliation(s)
| | | | | | | | | | - Edda Klipp
- Theoretical Biophysics, Humboldt-Universität zu Berlin, Invalidenstr. 42, 10115 Berlin, Germany; (G.S.); (F.R.); (F.R.); (A.F.P.); (P.L.)
| |
Collapse
|
8
|
Zhao H, Xu H, Wang T, Liu G. Constructing multilayer PPI networks based on homologous proteins and integrating multiple PageRank to identify essential proteins. BMC Bioinformatics 2025; 26:80. [PMID: 40059137 PMCID: PMC11892321 DOI: 10.1186/s12859-025-06093-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2025] [Accepted: 02/21/2025] [Indexed: 05/13/2025] Open
Abstract
BACKGROUND Predicting and studying essential proteins not only helps to understand the fundamental requirements for cell survival and growth regulation mechanisms but also deepens our understanding of disease mechanisms and drives drug development. Existing methods for identifying essential proteins primarily focus on PPI networks within a single species, without fully exploiting interspecies homologous relationships. These homologous relationships connect proteins from different species, forming multilayer PPI networks. Some methods only construct interlayer edges based on homologous relationships between two species, without incorporating appropriate biological attributes to assess the biological significance of these edges. Furthermore, homologous proteins are often highly conserved across multiple species, and expanding homologous relationships to more species allows for a more accurate assessment of interlayer edge importance. RESULTS To address these issues, we propose a novel model, MLPR, which constructs a multilayer PPI network based on homologous proteins and integrates multiple PageRank algorithms to identify essential proteins. This study combines homologous protein data from three species to construct interlayer transition matrices and assigns weights to interlayer edges by integrating the biological attributes of homologous proteins and cross-species GO annotations. The MLPR model uses multiple PageRank methods to comprehensively consider homologous relationships across species and designs three key parameters to find the optimal combination that balances random walks within layers, global jumps, interlayer biases, and interspecies homologous relationships. CONCLUSIONS Experimental results show that MLPR outperforms other state-of-the-art methods in terms of performance. Ablation experiments further validate that integrating homologous relationships across three species effectively enhances the overall performance of MLPR and demonstrates the advantages of the multiple PageRank model in identifying essential proteins.
Collapse
Affiliation(s)
- He Zhao
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Huan Xu
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Tao Wang
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, Changchun, China.
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China.
| |
Collapse
|
9
|
Kim T, Song J, Joo JWJ. MARSweb: a fully automated web service for set-based association testing. BMC Genomics 2025; 26:193. [PMID: 39994572 PMCID: PMC11853308 DOI: 10.1186/s12864-025-11356-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Accepted: 02/11/2025] [Indexed: 02/26/2025] Open
Abstract
BACKGROUND Despite the successes in GWAS, there is still a large gap between the known heritability and the part explained by the SNPs identified by GWAS. Set-based analysis is one of the approaches that has tried to identify associations between multiple variants in a locus a trait, leveraging allelic heterogeneity to increase power in association testing. MARS is a set-based analysis method that integrates likelihood ratio test with a recently developed fine mapping technique to accurately account for causal status of variants in a risk locus. Unfortunately, due to its complex running process, time complexity, and the requirement of high-performance computing resources, it is not widely used. RESULTS To address these issues, we proposed a fully automated web-based analysis service, MARSweb. By providing a web service, we minimized the effort required for initial configuration. Additionally, users can perform analyses by simply uploading their data without needing to familiarize themselves with intricate analysis procedures. Furthermore, it facilitates easier interpretation of results by integrating advanced visualization tools. We confirmed the performance of MARSweb by detecting eGenes and performing pathway analysis of the genes using a Yeast Dataset. CONCLUSIONS MARSweb is a web-based analysis service that fully automates set-based analysis. It offers an intuitive user interface, making complex analyses more accessible while significantly reducing processing time for enhanced efficiency. MARSweb is available for use at http://cblab.dongguk.edu/MARSweb and its source code is available at https://github.com/DGU-CBLAB/MARSweb .
Collapse
Affiliation(s)
- Taegun Kim
- Division of AI Software Convergence, Dongguk University-Seoul, Seoul, 04620, South Korea
- Department of Computer Science and Engineering, Dongguk University-Seoul, Seoul, 04620, South Korea
| | - Jaeseung Song
- Department of Life Science, Dongguk University-Seoul, Seoul, 04620, South Korea
| | - Jong Wha J Joo
- Division of AI Software Convergence, Dongguk University-Seoul, Seoul, 04620, South Korea.
- Department of Computer Science and Engineering, Dongguk University-Seoul, Seoul, 04620, South Korea.
| |
Collapse
|
10
|
Pinto J, Balarezo-Cisneros LN, Delneri D. Exploring adaptation routes to cold temperatures in the Saccharomyces genus. PLoS Genet 2025; 21:e1011199. [PMID: 39970180 PMCID: PMC11875353 DOI: 10.1371/journal.pgen.1011199] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2024] [Revised: 03/03/2025] [Accepted: 02/06/2025] [Indexed: 02/21/2025] Open
Abstract
The identification of traits that affect adaptation of microbial species to external abiotic factors, such as temperature, is key for our understanding of how biodiversity originates and can be maintained in a constantly changing environment. The Saccharomyces genus, which includes eight species with different thermotolerant profiles, represent an ideal experimental platform to study the impact of adaptive alleles in different genetic backgrounds. Previous studies identified a group of adaptive genes for maintenance of growth at lower temperatures. Here, we carried out a genus-wide assessment of the role of genes partially responsible for cold-adaptation in all eight Saccharomyces species for six candidate genes. We showed that the cold tolerance trait of S. kudriavzevii and S. eubayanus is likely to have evolved from different routes, involving genes important for the conservation of redox-balance, and for the long-chain fatty acid metabolism, respectively. For several loci, temperature- and species-dependent epistasis was detected, underscoring the plasticity and complexity of the genetic interactions. The natural isolates of S. kudriavzevii, S. jurei and S. mikatae had a significantly higher expression of the genes involved in the redox balance compared to S. cerevisiae, suggesting a role at transcriptional level. To distinguish the effects of gene expression from allelic variation, we independently replaced either the promoters or the coding sequences (CDS) of two genes in four yeast species with those derived from S. kudriavzevii. Our data consistently showed a significant fitness improvement at cold temperatures in the strains carrying the S. kudriavzevii promoter, while growth was lower upon CDS swapping. These results suggest that transcriptional strength plays a bigger role in growth maintenance at cold temperatures over the CDS and supports a model of adaptation centred on stochastic tuning of the expression network.
Collapse
Affiliation(s)
- Javier Pinto
- Faculty of Biology Medicine and Health, Manchester Institute of Biotechnology, The University of Manchester, Manchester, United Kingdom
| | - Laura Natalia Balarezo-Cisneros
- Faculty of Biology Medicine and Health, Manchester Institute of Biotechnology, The University of Manchester, Manchester, United Kingdom
| | - Daniela Delneri
- Faculty of Biology Medicine and Health, Manchester Institute of Biotechnology, The University of Manchester, Manchester, United Kingdom
| |
Collapse
|
11
|
Miao Z, Ren Y, Tarabini A, Yang L, Li H, Ye C, Liti G, Fischer G, Li J, Yue JX. ScRAPdb: an integrated pan-omics database for the Saccharomyces cerevisiae reference assembly panel. Nucleic Acids Res 2025; 53:D852-D863. [PMID: 39470715 PMCID: PMC11701598 DOI: 10.1093/nar/gkae955] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2024] [Revised: 10/05/2024] [Accepted: 10/10/2024] [Indexed: 10/30/2024] Open
Abstract
As a unicellular eukaryote, the budding yeast Saccharomyces cerevisiae strikes a unique balance between biological complexity and experimental tractability, serving as a long-standing classic model for both basic and applied studies. Recently, S. cerevisiae further emerged as a leading system for studying natural diversity of genome evolution and its associated functional implication at population scales. Having high-quality comparative and functional genomics data are critical for such efforts. Here, we exhaustively expanded the telomere-to-telomere (T2T) S. cerevisiae reference assembly panel (ScRAP) that we previously constructed for 142 strains to cover high-quality genome assemblies and annotations of 264 S. cerevisiae strains from diverse geographical and ecological niches and also 33 outgroup strains from all the other Saccharomyces species complex. We created a dedicated online database, ScRAPdb (https://www.evomicslab.org/db/ScRAPdb/), to host this expanded pangenome collection. Furthermore, ScRAPdb also integrates an array of population-scale pan-omics atlases (pantranscriptome, panproteome and panphenome) and extensive data exploration toolkits for intuitive genomics analyses. All curated data and downstream analysis results can be easily downloaded from ScRAPdb. We expect ScRAPdb to become a highly valuable platform for the yeast community and beyond, leading to a pan-omics understanding of the global genetic and phenotypic diversity.
Collapse
Affiliation(s)
- Zepu Miao
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, 651 Dongfeng East Road, Guangzhou 510060, China
| | - Yifan Ren
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, 651 Dongfeng East Road, Guangzhou 510060, China
| | - Andrea Tarabini
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, 7-9 Quai Saint Bernard, Paris 75005, France
| | - Ludong Yang
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, 651 Dongfeng East Road, Guangzhou 510060, China
| | - Huihui Li
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, 651 Dongfeng East Road, Guangzhou 510060, China
| | - Chang Ye
- Department of Chemistry, University of Chicago, 929 E 57th Street, Chicago, IL 60637, USA
| | - Gianni Liti
- CNRS, INSERM, IRCAN, Université Côte d’Azur, 28 Avenue de Valombrose, Nice 06107, France
| | - Gilles Fischer
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratory of Computational and Quantitative Biology, 7-9 Quai Saint Bernard, Paris 75005, France
| | - Jing Li
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, 651 Dongfeng East Road, Guangzhou 510060, China
| | - Jia-Xing Yue
- State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer, Sun Yat-sen University Cancer Center, 651 Dongfeng East Road, Guangzhou 510060, China
| |
Collapse
|
12
|
Yan R, Islam MT, Xing L. Deep representation learning of protein-protein interaction networks for enhanced pattern discovery. SCIENCE ADVANCES 2024; 10:eadq4324. [PMID: 39693438 DOI: 10.1126/sciadv.adq4324] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 11/14/2024] [Indexed: 12/20/2024]
Abstract
Protein-protein interaction (PPI) networks, where nodes represent proteins and edges depict myriad interactions among them, are fundamental to understanding the dynamics within biological systems. Despite their pivotal role in modern biology, reliably discerning patterns from these intertwined networks remains a substantial challenge. The essence of the challenge lies in holistically characterizing the relationships of each node with others in the network and effectively using this information for accurate pattern discovery. In this work, we introduce a self-supervised network embedding framework termed discriminative network embedding (DNE). Unlike conventional methods that primarily focus on direct or limited-order node proximity, DNE characterizes a node both locally and globally by harnessing the contrast between representations from neighboring and distant nodes. Our experimental results demonstrate DNE's superior performance over existing techniques across various critical network analyses, including PPI inference and the identification of protein functional modules. DNE emerges as a robust strategy for node representation in PPI networks, offering promising avenues for diverse biomedical applications.
Collapse
Affiliation(s)
- Rui Yan
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA
| | - Md Tauhidul Islam
- Department of Radiation Oncology, Stanford University, Stanford, CA 94305, USA
| | - Lei Xing
- Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305, USA
- Department of Radiation Oncology, Stanford University, Stanford, CA 94305, USA
- Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
13
|
Wu C, Lin B, Zhang J, Gao R, Song R, Liu ZP. AttentionEP: Predicting essential proteins via fusion of multiscale features by attention mechanisms. Comput Struct Biotechnol J 2024; 23:4315-4323. [PMID: 39697678 PMCID: PMC11652892 DOI: 10.1016/j.csbj.2024.11.039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2024] [Revised: 11/17/2024] [Accepted: 11/25/2024] [Indexed: 12/20/2024] Open
Abstract
Identifying essential proteins is of utmost importance in the field of biomedical research due to their essential functions in cellular activities and their involvement in mechanisms related to diseases. In this research, a novel approach called AttentionEP for predicting essential proteins (EP) is introduced by attention mechanisms. This method leverages both cross-attention and self-attention frameworks, focusing on enhancing prediction accuracy through the integration of features across diverse scales. Spatial characteristics of proteins are obtained from the protein-protein interaction (PPI) network by employing Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Following this, Bidirectional Long Short-Term Memory networks (BiLSTM) are employed to derive temporal features from gene expression datasets. Furthermore, spatial characteristics are derived by integrating data on subcellular localization with the application of Deep Neural Networks (DNN). In order to effectively integrate features across multiple scales, initial steps involve the application of self-attention techniques to derive essential insights from each unique data set. Following this, mechanisms involving self-attention and cross-attention are employed to enhance the interaction between diverse information sources. To identify essential proteins, a classifier based on the ResNet architecture is developed. The findings from the experiments indicate that the method introduced here shows superior performance in identifying essential proteins, recording an Area Under the Curve (AUC) value of 0.9433. This approach shows a considerable advantage over established techniques. The findings of this study provide a significant advancement in the comprehension of critical proteins, revealing promising potential for applications in the development of therapeutics and addressing various diseases.
Collapse
Affiliation(s)
- Chuanyan Wu
- School of Intelligent Engineering, Shandong Management University, No.3500 Dingxiang Road, Jinan, Shandong, 250357, China
| | - Bentao Lin
- School of Intelligent Engineering, Shandong Management University, No.3500 Dingxiang Road, Jinan, Shandong, 250357, China
| | - Jialin Zhang
- School of Control Science and Engineering, Shandong University, No.17923 Jingshi Road, Jinan, Shandong, 250061, China
| | - Rui Gao
- School of Control Science and Engineering, Shandong University, No.17923 Jingshi Road, Jinan, Shandong, 250061, China
| | - Rui Song
- School of Control Science and Engineering, Shandong University, No.17923 Jingshi Road, Jinan, Shandong, 250061, China
| | - Zhi-Ping Liu
- School of Control Science and Engineering, Shandong University, No.17923 Jingshi Road, Jinan, Shandong, 250061, China
| |
Collapse
|
14
|
Wang S, Cui H, Qu Y, Zhang Y. Multi-source biological knowledge-guided hypergraph spatiotemporal subnetwork embedding for protein complex identification. Brief Bioinform 2024; 26:bbae718. [PMID: 39814560 PMCID: PMC11735048 DOI: 10.1093/bib/bbae718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2024] [Revised: 12/19/2024] [Accepted: 12/30/2024] [Indexed: 01/18/2025] Open
Abstract
Identifying biologically significant protein complexes from protein-protein interaction (PPI) networks and understanding their roles are essential for elucidating protein functions, life processes, and disease mechanisms. Current methods typically rely on static PPI networks and model PPI data as pairwise relationships, which presents several limitations. Firstly, static PPI networks do not adequately represent the scopes and temporal dynamics of protein interactions. Secondly, a large amount of available biological resources have not been fully integrated. Moreover, PPIs in biological systems are not merely one-to-one relationships but involve higher order non-pairwise interactions. To alleviate these issues, we propose HGST, a multi-source biological knowledge-guided hypergraph spatiotemporal subnetwork (subnet) embedding method for identifying biologically significant protein complexes from PPI networks. HGST initially constructs spatiotemporal PPI subnets using the scopes and temporal dynamics of proteins derived from multi-source biological knowledge, treating them as dynamic networks through fine-grained spatiotemporal partitioning. The spatiotemporal subnets are then transformed into hypergraphs, which model higher order non-pairwise relationships via hypergraph embedding. Simultaneously, fine-grained amino acid sequence features and coarse-grained gene ontology attributes are introduced for multi-dimensional feature fusion. Finally, protein complexes are identified from the reweighted subnets based on fused feature representations using the core-attachment strategy. Evaluations on four real PPI datasets demonstrate that HGST achieves competitive performance. Furthermore, a series of biological analyses confirm the high biological significance of the complexes identified by HGST. The source code is available at https://github.com/qifen37/HGST.
Collapse
Affiliation(s)
- Shilong Wang
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, 116026, Dalian, Liaoning, China
| | - Hai Cui
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, 116026, Dalian, Liaoning, China
| | - Yanchen Qu
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, 116026, Dalian, Liaoning, China
| | - Yijia Zhang
- Information Science and Technology College, Dalian Maritime University, No.1 Linghai Road, 116026, Dalian, Liaoning, China
| |
Collapse
|
15
|
Chen J, Mirvis M, Ekman A, Vanslembrouck B, Le Gros M, Larabell C, Marshall WF. Automated segmentation of soft X-ray tomography: native cellular structure with sub-micron resolution at high throughput for whole-cell quantitative imaging in yeast. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.31.621371. [PMID: 39554159 PMCID: PMC11565976 DOI: 10.1101/2024.10.31.621371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/19/2024]
Abstract
Soft X-ray tomography (SXT) is an invaluable tool for quantitatively analyzing cellular structures at sub-optical isotropic resolution. However, it has traditionally depended on manual segmentation, limiting its scalability for large datasets. Here, we leverage a deep learning-based auto-segmentation pipeline to segment and label cellular structures in hundreds of cells across three Saccharomyces cerevisiae strains. This task-based pipeline employs manual iterative refinement to improve segmentation accuracy for key structures, including the cell body, nucleus, vacuole, and lipid droplets, enabling high-throughput and precise phenotypic analysis. Using this approach, we quantitatively compared the 3D whole-cell morphometric characteristics of wild-type, VPH1-GFP, and vac14 strains, uncovering detailed strain-specific cell and organelle size and shape variations. We show the utility of SXT data for precise 3D curvature analysis of entire organelles and cells and detection of fine morphological features using surface meshes. Our approach facilitates comparative analyses with high spatial precision and statistical throughput, uncovering subtle morphological features at the single cell and population level. This workflow significantly enhances our ability to characterize cell anatomy and supports scalable studies on the mesoscale, with applications in investigating cellular architecture, organelle biology, and genetic research across diverse biological contexts. Significance Statement Soft X-ray tomography offers many powerful features for whole-cell multi-organelle imaging, but, like other high resolution volumetric imaging modalities, is typically limited by low throughput due to laborious segmentation.Auto-segmentation for soft X-ray tomography overcomes this limitation, enabling statistical 3D morphometric analysis of multiple organelles in whole cells across cell populations. The combination of high 3D resolution of SXT data with statistically useful throughput represents an avenue for more thorough characterizations of cells in toto and opens new mesoscale biological questions and statistical whole-cell modeling of organelle and cell morphology, interactions, and responses to perturbations.
Collapse
|
16
|
Wegner SA, Avalos JL. Mevalonate secretion is not mediated by a singular non-essential transporter in Saccharomyces cerevisiae. BIOTECHNOLOGY NOTES (AMSTERDAM, NETHERLANDS) 2024; 5:140-150. [PMID: 39498316 PMCID: PMC11532745 DOI: 10.1016/j.biotno.2024.10.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2024] [Revised: 10/12/2024] [Accepted: 10/13/2024] [Indexed: 11/07/2024]
Abstract
Isoprenoids are highly valued targets for microbial chemical production, allowing the creation of fragrances, biofuels, and pharmaceuticals from renewable carbon feedstocks. To increase isoprenoid production, previous efforts have manipulated pyruvate dehydrogenase (PDH) bypass pathway flux to increase cytosolic acetyl-coA; however, this results in mevalonate secretion and does not necessarily translate into higher isoprenoid production. Identification and disruption of the transporter mediating mevalonate secretion would allow us to determine whether increasing PDH bypass activity in the absence of secretion improves conversion of mevalonate into downstream isoprenoids. Attempted identification of the mevalonate transporter was accomplished using a pooled CRISPR library targeting all nonessential transporters and two different screening methods. Using a high throughput screen, based on growth of a mevalonate auxotrophic Escherichia coli strain, it was found that ZRT3 disruption largely abolished accumulation of extracellular mevalonate. However, disruption of ZRT3 was found to lower overall mevalonate pathway activity, rather than prevent secretion, indicating a previously unreported interaction between zinc availability and the mevalonate pathway. In a second screen, significant differences in PDR5/15 and QDR1/2 library representation were found between wild-type and mevalonate secreting Saccharomyces cerevisiae strains. However, no single deletion (or selected pair of double deletions) abolishes mevalonate secretion, indicating that this process appears to be mediated through multiple redundant transporters.
Collapse
Affiliation(s)
- Scott A. Wegner
- Department of Molecular Biology, Princeton University, Princeton, NJ, 08544, USA
| | - José L. Avalos
- Department of Molecular Biology, Princeton University, Princeton, NJ, 08544, USA
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, 08544, USA
- The Andlinger Center for Energy and the Environment, Princeton University, Princeton, NJ, 08544, USA
- High Meadows Environmental Institute, Princeton University, Princeton, NJ, 08544, USA
| |
Collapse
|
17
|
Romero-Pérez PS, Moran HM, Horani A, Truong A, Manriquez-Sandoval E, Ramirez JF, Martinez A, Gollub E, Hunter K, Lotthammer JM, Emenecker RJ, Liu H, Iwasa JH, Boothby TC, Holehouse AS, Fried SD, Sukenik S. Protein surface chemistry encodes an adaptive tolerance to desiccation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.28.604841. [PMID: 39131385 PMCID: PMC11312438 DOI: 10.1101/2024.07.28.604841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]
Abstract
Cellular desiccation - the loss of nearly all water from the cell - is a recurring stress in an increasing number of ecosystems that can drive protein unfolding and aggregation. For cells to survive, at least some of the proteome must resume function upon rehydration. Which proteins tolerate desiccation, and the molecular determinants that underlie this tolerance, are largely unknown. Here, we apply quantitative and structural proteomic mass spectrometry to show that certain proteins possess an innate capacity to tolerate rehydration following extreme water loss. Structural analysis points to protein surface chemistry as a key determinant for desiccation tolerance, which we test by showing that rational surface mutants can convert a desiccation sensitive protein into a tolerant one. Desiccation tolerance also has strong overlap with cellular function, with highly tolerant proteins responsible for production of small molecule building blocks, and intolerant proteins involved in energy-consuming processes such as ribosome biogenesis. As a result, the rehydrated proteome is preferentially enriched with metabolite and small molecule producers and depleted of some of the cell's heaviest consumers. We propose this functional bias enables cells to kickstart their metabolism and promote cell survival following desiccation and rehydration.
Collapse
Affiliation(s)
| | - Haley M. Moran
- Department of Chemistry, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Azeem Horani
- Quantitative and Systems Biology Program, University of California Merced, Merced, CA 95343, USA
| | - Alexander Truong
- Dept of Chemistry and Biochemistry, University of California Merced, Merced, CA 95343, USA
| | - Edgar Manriquez-Sandoval
- T. C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - John F. Ramirez
- Department of Molecular Biology, University of Wyoming, Laramie, WY 82071, USA
| | - Alec Martinez
- Dept of Chemistry and Biochemistry, University of California Merced, Merced, CA 95343, USA
| | - Edith Gollub
- Dept of Chemistry and Biochemistry, University of California Merced, Merced, CA 95343, USA
| | - Kara Hunter
- Dept of Chemistry and Biochemistry, University of California Merced, Merced, CA 95343, USA
- Department of Chemistry, Syracuse University, Syracuse, NY 13244, USA
| | - Jeffrey M. Lotthammer
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Ryan J. Emenecker
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Hui Liu
- Department of Biochemistry, University of Utah, Salt Lake City, UT 84112, USA
| | - Janet H. Iwasa
- Department of Biochemistry, University of Utah, Salt Lake City, UT 84112, USA
| | - Thomas C. Boothby
- Department of Molecular Biology, University of Wyoming, Laramie, WY 82071, USA
| | - Alex S. Holehouse
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, MO 63110, USA
- Center for Biomolecular Condensates (CBC), Washington University in St. Louis, St. Louis, MO 63130, USA
| | - Stephen D. Fried
- Department of Chemistry, Johns Hopkins University, Baltimore, Maryland 21218, USA
- T. C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | - Shahar Sukenik
- Dept of Chemistry and Biochemistry, University of California Merced, Merced, CA 95343, USA
- Quantitative and Systems Biology Program, University of California Merced, Merced, CA 95343, USA
- Department of Chemistry, Syracuse University, Syracuse, NY 13244, USA
| |
Collapse
|
18
|
Lawson S, Donovan D, Lefevre J. An application of node and edge nonlinear hypergraph centrality to a protein complex hypernetwork. PLoS One 2024; 19:e0311433. [PMID: 39361678 PMCID: PMC11449304 DOI: 10.1371/journal.pone.0311433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2024] [Accepted: 09/12/2024] [Indexed: 10/05/2024] Open
Abstract
The use of graph centrality measures applied to biological networks, such as protein interaction networks, underpins much research into identifying key players within biological processes. This approach however is restricted to dyadic interactions and it is well-known that in many instances interactions are polyadic. In this study we illustrate the merit of using hypergraph centrality applied to a hypernetwork as an alternative. Specifically, we review and propose an extension to a recently introduced node and edge nonlinear hypergraph centrality model which provides mutually dependent node and edge centralities. A Saccharomyces Cerevisiae protein complex hypernetwork is used as an example application with nodes representing proteins and hyperedges representing protein complexes. The resulting rankings of the nodes and edges are considered to see if they provide insight into the essentiality of the proteins and complexes. We find that certain variations of the model predict essentiality more accurately and that the degree-based variation illustrates that the centrality-lethality rule extends to a hypergraph setting. In particular, through exploitation of the models flexibility, we identify small sets of proteins densely populated with essential proteins. One of the key advantages of applying this model to a protein complex hypernetwork is that it also provides a classification method for protein complexes, unlike previous approaches which are only concerned with classifying proteins.
Collapse
Affiliation(s)
- Sarah Lawson
- ARC Centre of Excellence, Plant Success in Nature and Agriculture, School of Mathematics and Physics, The University of Queensland, Brisbane, Queensland, Australia
| | - Diane Donovan
- ARC Centre of Excellence, Plant Success in Nature and Agriculture, School of Mathematics and Physics, The University of Queensland, Brisbane, Queensland, Australia
| | - James Lefevre
- ARC Centre of Excellence, Plant Success in Nature and Agriculture, School of Mathematics and Physics, The University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|
19
|
Lu P, Tian J. ACDMBI: A deep learning model based on community division and multi-source biological information fusion predicts essential proteins. Comput Biol Chem 2024; 112:108115. [PMID: 38865861 DOI: 10.1016/j.compbiolchem.2024.108115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 05/15/2024] [Accepted: 05/28/2024] [Indexed: 06/14/2024]
Abstract
Accurately identifying essential proteins is vital for drug research and disease diagnosis. Traditional centrality methods and machine learning approaches often face challenges in accurately discerning essential proteins, primarily relying on information derived from protein-protein interaction (PPI) networks. Despite attempts by some researchers to integrate biological data and PPI networks for predicting essential proteins, designing effective integration methods remains a challenge. In response to these challenges, this paper presents the ACDMBI model, specifically designed to overcome the aforementioned issues. ACDMBI is comprised of two key modules: feature extraction and classification. In terms of capturing relevant information, we draw insights from three distinct data sources. Initially, structural features of proteins are extracted from the PPI network through community division. Subsequently, these features are further optimized using Graph Convolutional Networks (GCN) and Graph Attention Networks (GAT). Moving forward, protein features are extracted from gene expression data utilizing Bidirectional Long Short-Term Memory networks (BiLSTM) and a multi-head self-attention mechanism. Finally, protein features are derived by mapping subcellular localization data to a one-dimensional vector and processing it through fully connected layers. In the classification phase, we integrate features extracted from three different data sources, crafting a multi-layer deep neural network (DNN) for protein classification prediction. Experimental results on brewing yeast data showcase the ACDMBI model's superior performance, with AUC reaching 0.9533 and AUPR reaching 0.9153. Ablation experiments further reveal that the effective integration of features from diverse biological information significantly boosts the model's performance.
Collapse
Affiliation(s)
- Pengli Lu
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China.
| | - Jialong Tian
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou 730050, China.
| |
Collapse
|
20
|
Liu L, Liu Y, Min L, Zhou Z, He X, Xie Y, Cao W, Deng S, Lin X, He X, Chen X. Most Pleiotropic Effects of Gene Knockouts Are Evolutionarily Transient in Yeasts. Mol Biol Evol 2024; 41:msae189. [PMID: 39238468 DOI: 10.1093/molbev/msae189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 07/12/2024] [Accepted: 08/30/2024] [Indexed: 09/07/2024] Open
Abstract
Pleiotropy, the phenomenon in which a single gene influences multiple traits, is a fundamental concept in genetics. However, the evolutionary mechanisms underlying pleiotropy require further investigation. In this study, we conducted parallel gene knockouts targeting 100 transcription factors in 2 strains of Saccharomyces cerevisiae. We systematically examined and quantified the pleiotropic effects of these knockouts on gene expression levels for each transcription factor. Our results showed that the knockout of a single gene generally affected the expression levels of multiple genes in both strains, indicating various degrees of pleiotropic effects. Strikingly, the pleiotropic effects of the knockouts change rapidly between strains in different genetic backgrounds, and ∼85% of them were nonconserved. Further analysis revealed that the conserved effects tended to be functionally associated with the deleted transcription factors, while the nonconserved effects appeared to be more ad hoc responses. In addition, we measured 184 yeast cell morphological traits in these knockouts and found consistent patterns. In order to investigate the evolutionary processes underlying pleiotropy, we examined the pleiotropic effects of standing genetic variations in a population consisting of ∼1,000 hybrid progenies of the 2 strains. We observed that newly evolved expression quantitative trait loci impacted the expression of a greater number of genes than did old expression quantitative trait loci, suggesting that natural selection is gradually eliminating maladaptive or slightly deleterious pleiotropic responses. Overall, our results show that, although being prevalent for new mutations, the majority of pleiotropic effects observed are evolutionarily transient, which explains how evolution proceeds despite complicated pleiotropic effects.
Collapse
Affiliation(s)
- Li Liu
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, Innovation Center for Evolutionary Synthetic Biology, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Yao Liu
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Lulu Min
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Zhenzhen Zhou
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Xingxing He
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - YunHan Xie
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, Innovation Center for Evolutionary Synthetic Biology, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Evolutionary Ecology, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany
| | - Waifang Cao
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Shuyun Deng
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Xiaoju Lin
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
| | - Xionglei He
- MOE Key Laboratory of Gene Function and Regulation, State Key Laboratory of Biocontrol, Innovation Center for Evolutionary Synthetic Biology, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Xiaoshu Chen
- Department of Immunology and Microbiology, Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
- Key Laboratory of Tropical Disease Control, Ministry of Education, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
21
|
Yang W, Zou S, Gao H, Wang L, Ni W. A Novel Method for Targeted Identification of Essential Proteins by Integrating Chemical Reaction Optimization and Naive Bayes Model. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1274-1286. [PMID: 38536675 DOI: 10.1109/tcbb.2024.3382392] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2024]
Abstract
Targeted identification of essential proteins is of great significance for species identification, drug manufacturing, and disease treatment. It is a challenge to analyze the binding mechanism between essential proteins and improve the identification speed while ensuring the accuracy of the identification. This paper proposes a novel method called EPCRO for identifying essential proteins, which incorporates the chemical reaction optimization (CRO) algorithm and the naive Bayes model to effectively detect essential proteins. In EPCRO, the naive Bayes model is employed to analyze the homogeneity between proteins. In order to improve the identification rate and speed of essential proteins, the protein homogeneity rate is integrated into the CRO algorithm to balance between local and global searches. EPCRO is experimentally compared with 17 existing methods (including, DC, SC, IC, EC, LAC, NC, PeC, WDC, EPD-RW, RWHN, TEGS, CFMM, BSPM, AFSO-EP, CVIM, RWEP, and EPPSO-DC) based on biological datasets. The results show that EPCRO is superior to the above methods in identification accuracy and speed.
Collapse
|
22
|
Patel LA, Cao Y, Mendenhall EM, Benner C, Goren A. The Wild West of spike-in normalization. Nat Biotechnol 2024; 42:1343-1349. [PMID: 39271835 DOI: 10.1038/s41587-024-02377-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/15/2024]
Affiliation(s)
- Lauren A Patel
- Department of Bioengineering, Jacobs School of Engineering, University of California San Diego, La Jolla, CA, USA
- Department of Medicine, Division of Endocrinology & Metabolism, University of California San Diego, La Jolla, CA, USA
- Department of Medicine, Division of Genomics & Precision Medicine, University of California San Diego, La Jolla, CA, USA
| | - Yuwei Cao
- Department of Medicine, Division of Genomics & Precision Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, USA
| | | | - Christopher Benner
- Department of Medicine, Division of Endocrinology & Metabolism, University of California San Diego, La Jolla, CA, USA.
| | - Alon Goren
- Department of Medicine, Division of Genomics & Precision Medicine, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
23
|
Zhao Y, Han Z, Zhu X, Chen B, Zhou L, Liu X, Liu H. Yeast Proteins: Proteomics, Extraction, Modification, Functional Characterization, and Structure: A Review. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2024; 72:18774-18793. [PMID: 39146464 DOI: 10.1021/acs.jafc.4c04821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Proteins are essential for human tissues and organs, and they require adequate intake for normal physiological functions. With a growing global population, protein demand rises annually. Traditional animal and plant protein sources rely heavily on land and water, making it difficult to meet the increasing demand. The high protein content of yeast and the complete range of amino acids in yeast proteins make it a high-quality source of supplemental protein. Screening of high-protein yeast strains using proteomics is essential to increase the value of yeast protein resources and to promote the yeast protein industry. However, current yeast extraction methods are mainly alkaline solubilization and acid precipitation; therefore, it is necessary to develop more efficient and environmentally friendly techniques. In addition, the functional properties of yeast proteins limit their application in the food industry. To improve these properties, methods must be selected to modify the secondary and tertiary structures of yeast proteins. This paper explores how proteomic analysis can be used to identify nutrient-rich yeast strains, compares the process of preparing yeast proteins, and investigates how modification methods affect the function and structure of yeast proteins. It provides a theoretical basis for solving the problem of inadequate protein intake in China and explores future prospects.
Collapse
Affiliation(s)
- Yan Zhao
- School of Food and Health, Beijing Technology and Business University, Beijing 100080, China
| | - Zhaowei Han
- School of Food and Health, Beijing Technology and Business University, Beijing 100080, China
| | - Xuchun Zhu
- School of Food and Health, Beijing Technology and Business University, Beijing 100080, China
| | - Bingyu Chen
- Graduate School of Agriculture, Kyoto University, Kyoto606-8502, Japan
| | - Linyi Zhou
- School of Food and Health, Beijing Technology and Business University, Beijing 100080, China
| | - Xiaoyong Liu
- Henan Agricultural University, Zhengzhou, Henan 450002, China
| | - Hongzhi Liu
- School of Food and Health, Beijing Technology and Business University, Beijing 100080, China
- College of Food and Pharmaceutical Engineering, Guizhou Institute of Technology, Guiyang, Guizhou 550025, China
| |
Collapse
|
24
|
Mucelli X, Huang LS. Naming internal insertion alleles created using CRISPR in Saccharomyces cerevisiae. MICROPUBLICATION BIOLOGY 2024; 2024:10.17912/micropub.biology.001258. [PMID: 39185013 PMCID: PMC11342080 DOI: 10.17912/micropub.biology.001258] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Figures] [Subscribe] [Scholar Register] [Received: 06/13/2024] [Revised: 07/23/2024] [Accepted: 08/06/2024] [Indexed: 08/27/2024]
Abstract
The budding yeast Saccharomyces cerevisiae is a powerful model organism, partly because of the ease of genome alterations due to the combination of a fast generation time and many molecular genetic tools. Recent advances in CRISPR-based systems allow for the easier creation of alleles with internally inserted sequences within the coding regions of genes, such as the internal insertion of sequences that code for epitopes or fluorescent proteins. Here we briefly summarize some exisiting nomenclature standards and suggest nomenclature guidelines for internal insertion alleles which are informative, consistent, and computable.
Collapse
|
25
|
Chen P, Zhang J. The loci of environmental adaptation in a model eukaryote. Nat Commun 2024; 15:5672. [PMID: 38971805 PMCID: PMC11227561 DOI: 10.1038/s41467-024-50002-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 06/25/2024] [Indexed: 07/08/2024] Open
Abstract
While the underlying genetic changes have been uncovered in some cases of adaptive evolution, the lack of a systematic study prevents a general understanding of the genomic basis of adaptation. For example, it is unclear whether protein-coding or noncoding mutations are more important to adaptive evolution and whether adaptations to different environments are brought by genetic changes distributed in diverse genes and biological processes or concentrated in a core set. We here perform laboratory evolution of 3360 Saccharomyces cerevisiae populations in 252 environments of varying levels of stress. We find the yeast adaptations to be primarily fueled by large-effect coding mutations overrepresented in a relatively small gene set, despite prevalent antagonistic pleiotropy across environments. Populations generally adapt faster in more stressful environments, partly because of greater benefits of the same mutations in more stressful environments. These and other findings from this model eukaryote help unravel the genomic principles of environmental adaptation.
Collapse
Affiliation(s)
- Piaopiao Chen
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, 48109, USA
- College of Life Sciences, Zhejiang University, Hangzhou, 310058, China
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, Michigan, 48109, USA.
| |
Collapse
|
26
|
Chaudhari JK, Pant S, Jha R, Pathak RK, Singh DB. Biological big-data sources, problems of storage, computational issues, and applications: a comprehensive review. Knowl Inf Syst 2024; 66:3159-3209. [DOI: 10.1007/s10115-023-02049-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 09/12/2023] [Accepted: 12/11/2023] [Indexed: 01/03/2025]
|
27
|
Li Z, Wang S, Cui H, Liu X, Zhang Y. Spatiotemporal constrained RNA-protein heterogeneous network for protein complex identification. Brief Bioinform 2024; 25:bbae280. [PMID: 38856171 PMCID: PMC11163383 DOI: 10.1093/bib/bbae280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/05/2024] [Accepted: 05/24/2024] [Indexed: 06/11/2024] Open
Abstract
The identification of protein complexes from protein interaction networks is crucial in the understanding of protein function, cellular processes and disease mechanisms. Existing methods commonly rely on the assumption that protein interaction networks are highly reliable, yet in reality, there is considerable noise in the data. In addition, these methods fail to account for the regulatory roles of biomolecules during the formation of protein complexes, which is crucial for understanding the generation of protein interactions. To this end, we propose a SpatioTemporal constrained RNA-protein heterogeneous network for Protein Complex Identification (STRPCI). STRPCI first constructs a multiplex heterogeneous protein information network to capture deep semantic information by extracting spatiotemporal interaction patterns. Then, it utilizes a dual-view aggregator to aggregate heterogeneous neighbor information from different layers. Finally, through contrastive learning, STRPCI collaboratively optimizes the protein embedding representations under different spatiotemporal interaction patterns. Based on the protein embedding similarity, STRPCI reweights the protein interaction network and identifies protein complexes with core-attachment strategy. By considering the spatiotemporal constraints and biomolecular regulatory factors of protein interactions, STRPCI measures the tightness of interactions, thus mitigating the impact of noisy data on complex identification. Evaluation results on four real PPI networks demonstrate the effectiveness and strong biological significance of STRPCI. The source code implementation of STRPCI is available from https://github.com/LI-jasm/STRPCI.
Collapse
Affiliation(s)
- Zeqian Li
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Shilong Wang
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Hai Cui
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Xiaoxia Liu
- Department of Neurology and Neurological Sciences, Stanford University, CA 94305, USA
| | - Yijia Zhang
- School of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| |
Collapse
|
28
|
Desroches Altamirano C, Kang MK, Jordan MA, Borianne T, Dilmen I, Gnädig M, von Appen A, Honigmann A, Franzmann TM, Alberti S. eIF4F is a thermo-sensing regulatory node in the translational heat shock response. Mol Cell 2024; 84:1727-1741.e12. [PMID: 38547866 DOI: 10.1016/j.molcel.2024.02.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 12/18/2023] [Accepted: 02/29/2024] [Indexed: 05/05/2024]
Abstract
Heat-shocked cells prioritize the translation of heat shock (HS) mRNAs, but the underlying mechanism is unclear. We report that HS in budding yeast induces the disassembly of the eIF4F complex, where eIF4G and eIF4E assemble into translationally arrested mRNA ribonucleoprotein particles (mRNPs) and HS granules (HSGs), whereas eIF4A promotes HS translation. Using in vitro reconstitution biochemistry, we show that a conformational rearrangement of the thermo-sensing eIF4A-binding domain of eIF4G dissociates eIF4A and promotes the assembly with mRNA into HS-mRNPs, which recruit additional translation factors, including Pab1p and eIF4E, to form multi-component condensates. Using extracts and cellular experiments, we demonstrate that HS-mRNPs and condensates repress the translation of associated mRNA and deplete translation factors that are required for housekeeping translation, whereas HS mRNAs can be efficiently translated by eIF4A. We conclude that the eIF4F complex is a thermo-sensing node that regulates translation during HS.
Collapse
Affiliation(s)
- Christine Desroches Altamirano
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Moo-Koo Kang
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Mareike A Jordan
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany
| | - Tom Borianne
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Irem Dilmen
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Maren Gnädig
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Alexander von Appen
- Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstraße 108, 01307 Dresden, Germany
| | - Alf Honigmann
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Titus M Franzmann
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany
| | - Simon Alberti
- Biotechnology Center (BIOTEC), Center for Molecular and Cellular Bioengineering, Technische Universität Dresden, Tatzberg 47/49, 01307 Dresden, Germany.
| |
Collapse
|
29
|
Saha S, Chatterjee P, Basu S, Nasipuri M. EPI-SF: essential protein identification in protein interaction networks using sequence features. PeerJ 2024; 12:e17010. [PMID: 38495766 PMCID: PMC10944162 DOI: 10.7717/peerj.17010] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 02/05/2024] [Indexed: 03/19/2024] Open
Abstract
Proteins are considered indispensable for facilitating an organism's viability, reproductive capabilities, and other fundamental physiological functions. Conventional biological assays are characterized by prolonged duration, extensive labor requirements, and financial expenses in order to identify essential proteins. Therefore, it is widely accepted that employing computational methods is the most expeditious and effective approach to successfully discerning essential proteins. Despite being a popular choice in machine learning (ML) applications, the deep learning (DL) method is not suggested for this specific research work based on sequence features due to the restricted availability of high-quality training sets of positive and negative samples. However, some DL works on limited availability of data are also executed at recent times which will be our future scope of work. Conventional ML techniques are thus utilized in this work due to their superior performance compared to DL methodologies. In consideration of the aforementioned, a technique called EPI-SF is proposed here, which employs ML to identify essential proteins within the protein-protein interaction network (PPIN). The protein sequence is the primary determinant of protein structure and function. So, initially, relevant protein sequence features are extracted from the proteins within the PPIN. These features are subsequently utilized as input for various machine learning models, including XGB Boost Classifier, AdaBoost Classifier, logistic regression (LR), support vector classification (SVM), Decision Tree model (DT), Random Forest model (RF), and Naïve Bayes model (NB). The objective is to detect the essential proteins within the PPIN. The primary investigation conducted on yeast examined the performance of various ML models for yeast PPIN. Among these models, the RF model technique had the highest level of effectiveness, as indicated by its precision, recall, F1-score, and AUC values of 0.703, 0.720, 0.711, and 0.745, respectively. It is also found to be better in performance when compared to the other state-of-arts based on traditional centrality like betweenness centrality (BC), closeness centrality (CC), etc. and deep learning methods as well like DeepEP, as emphasized in the result section. As a result of its favorable performance, EPI-SF is later employed for the prediction of novel essential proteins inside the human PPIN. Due to the tendency of viruses to selectively target essential proteins involved in the transmission of diseases within human PPIN, investigations are conducted to assess the probable involvement of these proteins in COVID-19 and other related severe diseases.
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science & Engineering (Artificial Intelligence & Machine Learning), Techno Main Salt Lake, Kolkata, West Bengal, India
| | - Piyali Chatterjee
- Department of Computer Science & Engineering, Netaji Subhash Engineering College, Kolkata, West Bengal, India
| | - Subhadip Basu
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, West Bengal, India
| | - Mita Nasipuri
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, West Bengal, India
| |
Collapse
|
30
|
Garge RK, Geck RC, Armstrong JO, Dunn B, Boutz DR, Battenhouse A, Leutert M, Dang V, Jiang P, Kwiatkowski D, Peiser T, McElroy H, Marcotte EM, Dunham MJ. Systematic profiling of ale yeast protein dynamics across fermentation and repitching. G3 (BETHESDA, MD.) 2024; 14:jkad293. [PMID: 38135291 PMCID: PMC10917522 DOI: 10.1093/g3journal/jkad293] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 11/28/2023] [Accepted: 12/07/2023] [Indexed: 12/24/2023]
Abstract
Studying the genetic and molecular characteristics of brewing yeast strains is crucial for understanding their domestication history and adaptations accumulated over time in fermentation environments, and for guiding optimizations to the brewing process itself. Saccharomyces cerevisiae (brewing yeast) is among the most profiled organisms on the planet, yet the temporal molecular changes that underlie industrial fermentation and beer brewing remain understudied. Here, we characterized the genomic makeup of a Saccharomyces cerevisiae ale yeast widely used in the production of Hefeweizen beers, and applied shotgun mass spectrometry to systematically measure the proteomic changes throughout 2 fermentation cycles which were separated by 14 rounds of serial repitching. The resulting brewing yeast proteomics resource includes 64,740 protein abundance measurements. We found that this strain possesses typical genetic characteristics of Saccharomyces cerevisiae ale strains and displayed progressive shifts in molecular processes during fermentation based on protein abundance changes. We observed protein abundance differences between early fermentation batches compared to those separated by 14 rounds of serial repitching. The observed abundance differences occurred mainly in proteins involved in the metabolism of ergosterol and isobutyraldehyde. Our systematic profiling serves as a starting point for deeper characterization of how the yeast proteome changes during commercial fermentations and additionally serves as a resource to guide fermentation protocols, strain handling, and engineering practices in commercial brewing and fermentation environments. Finally, we created a web interface (https://brewing-yeast-proteomics.ccbb.utexas.edu/) to serve as a valuable resource for yeast geneticists, brewers, and biochemists to provide insights into the global trends underlying commercial beer production.
Collapse
Affiliation(s)
- Riddhiman K Garge
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Renee C Geck
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Joseph O Armstrong
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Barbara Dunn
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Daniel R Boutz
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
- Antibody Discovery and Accelerated Protein Therapeutics, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute, Houston, TX 77030, USA
| | - Anna Battenhouse
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Mario Leutert
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
- Institute of Molecular Systems Biology, ETH Zürich, Zürich 8049, Switzerland
| | - Vy Dang
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Pengyao Jiang
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | | | | | | | - Edward M Marcotte
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, TX 78712, USA
| | - Maitreya J Dunham
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
31
|
Lee B, Hokamp K, Alhussain MM, Bamagoos AA, Fleming AB. The influence of flocculation upon global gene transcription in a yeast CYC8 mutant. Microb Genom 2024; 10:001216. [PMID: 38529898 PMCID: PMC10995634 DOI: 10.1099/mgen.0.001216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Accepted: 02/29/2024] [Indexed: 03/27/2024] Open
Abstract
The transcriptome from a Saccharomyces cerevisiae tup1 deletion mutant was one of the first comprehensive yeast transcriptomes published. Subsequent transcriptomes from tup1 and cyc8 mutants firmly established the Tup1-Cyc8 complex as predominantly acting as a repressor of gene transcription. However, transcriptomes from tup1/cyc8 gene deletion or conditional mutants would all have been influenced by the striking flocculation phenotypes that these mutants display. In this study, we have separated the impact of flocculation from the transcriptome in a cyc8 conditional mutant to reveal those genes (i) subject solely to Cyc8p-dependent regulation, (ii) regulated by flocculation only and (iii) regulated by Cyc8p and further influenced by flocculation. We reveal a more accurate list of Cyc8p-regulated genes that includes newly identified Cyc8p-regulated genes that were masked by the flocculation phenotype and excludes genes which were indirectly influenced by flocculation and not regulated by Cyc8p. Furthermore, we show evidence that flocculation exerts a complex and potentially dynamic influence upon global gene transcription. These data should be of interest to future studies into the mechanism of action of the Tup1-Cyc8 complex and to studies involved in understanding the development of flocculation and its impact upon cell function.
Collapse
Affiliation(s)
- Brenda Lee
- Department of Microbiology, School of Genetics and Microbiology, Moyne Institute of Preventive Medicine, Trinity College Dublin, Dublin, Ireland
| | - Karsten Hokamp
- Department of Genetics, School of Genetics and Microbiology, Smurfit Institute, Trinity College Dublin, Dublin, Ireland
| | - Mohamed M. Alhussain
- Department of Microbiology, School of Genetics and Microbiology, Moyne Institute of Preventive Medicine, Trinity College Dublin, Dublin, Ireland
| | - Atif A. Bamagoos
- Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Alastair B. Fleming
- Department of Microbiology, School of Genetics and Microbiology, Moyne Institute of Preventive Medicine, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
32
|
Sousa AD, Costa AL, Costa V, Pereira C. Prediction and biological analysis of yeast VDAC1 phosphorylation. Arch Biochem Biophys 2024; 753:109914. [PMID: 38290597 DOI: 10.1016/j.abb.2024.109914] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/02/2024] [Accepted: 01/25/2024] [Indexed: 02/01/2024]
Abstract
The mitochondrial outer membrane protein porin 1 (Por1), the yeast orthologue of mammalian voltage-dependent anion channel (VDAC), is the major permeability pathway for the flux of metabolites and ions between cytosol and mitochondria. In yeast, several Por1 phosphorylation sites have been identified. Protein phosphorylation is a major modification regulating a variety of biological activities, but the potential biological roles of Por1 phosphorylation remains unaddressed. In this work, we analysed 10 experimentally observed phosphorylation sites in yeast Por1 using bioinformatics tools. Two of the residues, T100 and S133, predicted to reduce and increase pore permeability, respectively, were validated using biological assays. In accordance, Por1T100D reduced mitochondrial respiration, while Por1S133E phosphomimetic mutant increased it. Por1T100A expression also improved respiratory growth, while Por1S133A caused defects in all growth conditions tested, notably in fermenting media. In conclusion, we found phosphorylation has the potential to modulate Por1, causing a marked effect on mitochondrial function. It can also impact on cell morphology and growth both in respiratory and, unpredictably, also in fermenting conditions, expanding our knowledge on the role of Por1 in cell physiology.
Collapse
Affiliation(s)
- André D Sousa
- i3S - Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal; IBMC - Instituto de Biologia Celular e Molecular, Universidade do Porto, Portugal
| | - Ana Luisa Costa
- i3S - Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal; IBMC - Instituto de Biologia Celular e Molecular, Universidade do Porto, Portugal
| | - Vítor Costa
- i3S - Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal; IBMC - Instituto de Biologia Celular e Molecular, Universidade do Porto, Portugal; ICBAS - Instituto de Ciências Biomédicas Abel Salazar, Universidade do Porto, Portugal
| | - Clara Pereira
- i3S - Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Portugal; IBMC - Instituto de Biologia Celular e Molecular, Universidade do Porto, Portugal.
| |
Collapse
|
33
|
Wang J, Li S, Sun Z, Lao Q, Shen B, Li K, Nie Y. Full-length radiograph based automatic musculoskeletal modeling using convolutional neural network. J Biomech 2024; 166:112046. [PMID: 38467079 DOI: 10.1016/j.jbiomech.2024.112046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 02/27/2024] [Accepted: 03/07/2024] [Indexed: 03/13/2024]
Abstract
Full-length radiographs contain information from which many anatomical parameters of the pelvis, femur, and tibia may be derived, but only a few anatomical parameters are used for musculoskeletal modeling. This study aimed to develop a fully automatic algorithm to extract anatomical parameters from full-length radiograph to generate a musculoskeletal model that is more accurate than linear scaled one. A U-Net convolutional neural network was trained to segment the pelvis, femur, and tibia from the full-length radiograph. Eight anatomic parameters (six for length and width, two for angles) were automatically extracted from the bone segmentation masks and used to generate the musculoskeletal model. Sørensen-Dice coefficient was used to quantify the consistency of automatic bone segmentation masks with manually segmented labels. Maximum distance error, root mean square (RMS) distance error and Jaccard index (JI) were used to evaluate the geometric accuracy of the automatically generated pelvis, femur and tibia models versus CT bone models. Mean Sørensen-Dice coefficients for the pelvis, femur and tibia 2D segmentation masks were 0.9898, 0.9822 and 0.9786, respectively. The algorithm-driven bone models were closer to the 3D CT bone models than the scaled generic models in geometry, with significantly lower maximum distance error (28.3 % average decrease from 24.35 mm) and RMS distance error (28.9 % average decrease from 9.55 mm) and higher JI (17.2 % average increase from 0.46) (P < 0.001). The algorithm-driven musculoskeletal modeling (107.15 ± 10.24 s) was faster than the manual process (870.07 ± 44.79 s) for the same full-length radiograph. This algorithm provides a fully automatic way to generate a musculoskeletal model from full-length radiograph that achieves an approximately 30 % reduction in distance errors, which could enable personalized musculoskeletal simulation based on full-length radiograph for large scale OA populations.
Collapse
Affiliation(s)
- Junqing Wang
- Department of Orthopedic Surgery and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China.
| | - Shiqi Li
- Department of Orthopedic Surgery and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China; West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China; College of Electrical Engineering, Sichuan University, Chengdu, Sichuan Province, China.
| | - Zitong Sun
- Sichuan University-Pittsburgh Institute (SCUPI), Sichuan University, Chengdu, Sichuan Province, China.
| | - Qicheng Lao
- School of Artificial Intelligence, Beijing University of Posts and Telecommunications (BUPT), Beijing, China
| | - Bin Shen
- Department of Orthopedic Surgery and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China.
| | - Kang Li
- West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China.
| | - Yong Nie
- Department of Orthopedic Surgery and Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, Sichuan Province, China.
| |
Collapse
|
34
|
Li H, Xie J, Song J, Jin C, Xin H, Pan X, Ke J, Yuan Y, Shen H, Ning G. CRCS: An automatic image processing pipeline for hormone level analysis of Cushing's disease. Methods 2024; 222:28-40. [PMID: 38159688 DOI: 10.1016/j.ymeth.2023.12.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 12/01/2023] [Accepted: 12/25/2023] [Indexed: 01/03/2024] Open
Abstract
Due to the abnormal secretion of adreno-cortico-tropic-hormone (ACTH) by tumors, Cushing's disease leads to hypercortisonemia, a precursor to a series of metabolic disorders and serious complications. Cushing's disease has high recurrence rate, short recurrence time and undiscovered recurrence reason after surgical resection. Qualitative or quantitative automatic image analysis of histology images can potentially in providing insights into Cushing's disease, but still no software has been available to the best of our knowledge. In this study, we propose a quantitative image analysis-based pipeline CRCS, which aims to explore the relationship between the expression level of ACTH in normal cell tissues adjacent to tumor cells and the postoperative prognosis of patients. CRCS mainly consists of image-level clustering, cluster-level multi-modal image registration, patch-level image classification and pixel-level image segmentation on the whole slide imaging (WSI). On both image registration and classification tasks, our method CRCS achieves state-of-the-art performance compared to recently published methods on our collected benchmark dataset. In addition, CRCS achieves an accuracy of 0.83 for postoperative prognosis of 12 cases. CRCS demonstrates great potential for instrumenting automatic diagnosis and treatment for Cushing's disease.
Collapse
Affiliation(s)
- Haiyue Li
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Jing Xie
- Department of Pathology, Ruijin Hospital, Shanghai Jiao Tong University, School of Medicine, 197 Ruijin 2nd Road, Shanghai 200025, China
| | - Jialin Song
- The Key Laboratory of Biomedical Information Engineering of Ministry of Education, School of Life Science and Technology, Xi'an Jiao Tong University, Xi'an 710049, China
| | - Cheng Jin
- Medical Robot Research Institute, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Hongyi Xin
- University of Michigan - Shanghai Jiao Tong University Joint Institute Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Jing Ke
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Ye Yuan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Hongbin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Guang Ning
- State Key Laboratory of Medical Genomes, National Clinical Research Center for Endocrine and Metabolic Diseases, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China; Laboratory of Endocrinology and Metabolism, Institute of Health Sciences, Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS) & Shanghai Jiao Tong University School of Medicine (SJTUSM), Shanghai, China.
| |
Collapse
|
35
|
Gaikani HK, Stolar M, Kriti D, Nislow C, Giaever G. From beer to breadboards: yeast as a force for biological innovation. Genome Biol 2024; 25:10. [PMID: 38178179 PMCID: PMC10768129 DOI: 10.1186/s13059-023-03156-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 12/21/2023] [Indexed: 01/06/2024] Open
Abstract
The history of yeast Saccharomyces cerevisiae, aka brewer's or baker's yeast, is intertwined with our own. Initially domesticated 8,000 years ago to provide sustenance to our ancestors, for the past 150 years, yeast has served as a model research subject and a platform for technology. In this review, we highlight many ways in which yeast has served to catalyze the fields of functional genomics, genome editing, gene-environment interaction investigation, proteomics, and bioinformatics-emphasizing how yeast has served as a catalyst for innovation. Several possible futures for this model organism in synthetic biology, drug personalization, and multi-omics research are also presented.
Collapse
Affiliation(s)
- Hamid Kian Gaikani
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC, Canada
- Department of Chemistry, University of British Columbia, Vancouver, BC, Canada
| | - Monika Stolar
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC, Canada
| | - Divya Kriti
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC, Canada
| | - Corey Nislow
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC, Canada.
| | - Guri Giaever
- Faculty of Pharmaceutical Sciences, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
36
|
Li G, Luo X, Hu Z, Wu J, Peng W, Liu J, Zhu X. Essential proteins discovery based on dominance relationship and neighborhood similarity centrality. Health Inf Sci Syst 2023; 11:55. [PMID: 37981988 PMCID: PMC10654316 DOI: 10.1007/s13755-023-00252-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 10/13/2023] [Indexed: 11/21/2023] Open
Abstract
Essential proteins play a vital role in development and reproduction of cells. The identification of essential proteins helps to understand the basic survival of cells. Due to time-consuming, costly and inefficient with biological experimental methods for discovering essential proteins, computational methods have gained increasing attention. In the initial stage, essential proteins are mainly identified by the centralities based on protein-protein interaction (PPI) networks, which limit their identification rate due to many false positives in PPI networks. In this study, a purified PPI network is firstly introduced to reduce the impact of false positives in the PPI network. Secondly, by analyzing the similarity relationship between a protein and its neighbors in the PPI network, a new centrality called neighborhood similarity centrality (NSC) is proposed. Thirdly, based on the subcellular localization and orthologous data, the protein subcellular localization score and ortholog score are calculated, respectively. Fourthly, by analyzing a large number of methods based on multi-feature fusion, it is found that there is a special relationship among features, which is called dominance relationship, then, a novel model based on dominance relationship is proposed. Finally, NSC, subcellular localization score, and ortholog score are fused by the dominance relationship model, and a new method called NSO is proposed. In order to verify the performance of NSO, the seven representative methods (ION, NCCO, E_POC, SON, JDC, PeC, WDC) are compared on yeast datasets. The experimental results show that the NSO method has higher identification rate than other methods.
Collapse
Affiliation(s)
- Gaoshi Li
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Xinlong Luo
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Zhipeng Hu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Jingli Wu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, 650500 Yunnan China
| | - Jiafei Liu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
| | - Xiaoshu Zhu
- Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004 China
- Guangxi Key Lab of Multi-source Information Mining & Security, Guangxi Normal University, Guilin, 541004 Guangxi China
- College of Computer Science and Engineering, Guangxi Normal University, Guilin, 541004 Guangxi China
- School of Computer and Information Security & School of Software Engineering, Guilin University of Electronic Science and Technology, Guilin, China
| |
Collapse
|
37
|
Zhao H, Liu G, Cao X. A seed expansion-based method to identify essential proteins by integrating protein-protein interaction sub-networks and multiple biological characteristics. BMC Bioinformatics 2023; 24:452. [PMID: 38036960 PMCID: PMC10688502 DOI: 10.1186/s12859-023-05583-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2023] [Accepted: 11/24/2023] [Indexed: 12/02/2023] Open
Abstract
BACKGROUND The identification of essential proteins is of great significance in biology and pathology. However, protein-protein interaction (PPI) data obtained through high-throughput technology include a high number of false positives. To overcome this limitation, numerous computational algorithms based on biological characteristics and topological features have been proposed to identify essential proteins. RESULTS In this paper, we propose a novel method named SESN for identifying essential proteins. It is a seed expansion method based on PPI sub-networks and multiple biological characteristics. Firstly, SESN utilizes gene expression data to construct PPI sub-networks. Secondly, seed expansion is performed simultaneously in each sub-network, and the expansion process is based on the topological features of predicted essential proteins. Thirdly, the error correction mechanism is based on multiple biological characteristics and the entire PPI network. Finally, SESN analyzes the impact of each biological characteristic, including protein complex, gene expression data, GO annotations, and subcellular localization, and adopts the biological data with the best experimental results. The output of SESN is a set of predicted essential proteins. CONCLUSIONS The analysis of each component of SESN indicates the effectiveness of all components. We conduct comparison experiments using three datasets from two species, and the experimental results demonstrate that SESN achieves superior performance compared to other methods.
Collapse
Affiliation(s)
- He Zhao
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Guixia Liu
- College of Computer Science and Technology, Jilin University, Changchun, China.
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China.
| | - Xintian Cao
- College of Computer Science and Technology, Jilin University, Changchun, China
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
38
|
Yu C, Wang Y, Tang C, Feng W, Lv J. EU-Net: Automatic U-Net neural architecture search with differential evolutionary algorithm for medical image segmentation. Comput Biol Med 2023; 167:107579. [PMID: 39491922 DOI: 10.1016/j.compbiomed.2023.107579] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/23/2023] [Accepted: 10/15/2023] [Indexed: 11/05/2024]
Abstract
Medical images are crucial in clinical practice, providing essential information for patient assessment and treatment planning. However, manual extraction of information from images is both time-consuming and prone to errors. The emergence of U-Net addresses this challenge by automating the segmentation of anatomical structures and pathological lesions in medical images, thereby significantly enhancing the accuracy of image interpretation and diagnosis. However, the performance of U-Net largely depends on its encoder-decoder structure, which requires researchers with knowledge of neural network architecture design and an in-depth understanding of medical images. In this paper, we propose an automatic U-Net Neural Architecture Search (NAS) algorithm using the differential evolutionary (DE) algorithm, named EU-Net, to segment critical information in medical images to assist physicians in diagnosis. Specifically, by presenting the variable-length strategy, the proposed EU-Net algorithm can sufficiently and automatically search for the neural network architecture without expertise. Moreover, the utilization of crossover, mutation, and selection strategies of DE takes account of the trade-off between exploration and exploitation in the search space. Finally, in the encoding and decoding phases of the proposed algorithm, different block-based and layer-based structures are introduced for architectural optimization. The proposed EU-Net algorithm is validated on two widely used medical datasets, i.e., CHAOS and BUSI, for image segmentation tasks. Extensive experimental results show that the proposed EU-Net algorithm outperforms the chosen peer competitors in both two datasets. In particular, compared to the original U-Net, our proposed method improves the metric mIou by at least 6%.
Collapse
Affiliation(s)
- Caiyang Yu
- College of Computer Science, Sichuan University, Chengdu, 610065, China.
| | - Yixi Wang
- College of Computer Science, Sichuan University, Chengdu, 610065, China.
| | - Chenwei Tang
- College of Computer Science, Sichuan University, Chengdu, 610065, China.
| | - Wentao Feng
- College of Computer Science, Sichuan University, Chengdu, 610065, China.
| | - Jiancheng Lv
- College of Computer Science, Sichuan University, Chengdu, 610065, China.
| |
Collapse
|
39
|
Sahoo TR, Patra S, Vipsita S. Decision tree classifier based on topological characteristics of subgraph for the mining of protein complexes from large scale PPI networks. Comput Biol Chem 2023; 106:107935. [PMID: 37536230 DOI: 10.1016/j.compbiolchem.2023.107935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Revised: 06/11/2023] [Accepted: 07/23/2023] [Indexed: 08/05/2023]
Abstract
The growing accessibility of large-scale protein interaction data demands extensive research to understand cell organization and its functioning at the network level. Bioinformatics and data mining researchers have extensively studied network clustering to examine the structural and operational features of protein protein interaction (PPI) networks. Clustering PPI networks has proven useful in numerous research over the past two decades for identifying functional modules, understanding the roles of previously unknown proteins, and other purposes. Protein complexes represent one of the essential cellular components for creating biological activities. Inferring protein complexes has been made more accessible by experimental approaches. We offer a novel method that integrates the classification model with local topological data, making it more reliable and efficient. This article describes a decision tree classifier based on topological characteristics of the subgraph for mining protein complexes. The proposed graph-based algorithm is an effective and efficient way to identify protein complexes from large-scale PPI networks. The performance of the proposed algorithm is observed in protein-protein interaction networks of yeast and human in the Database of Interacting Proteins (DIP) and the Biological General Repository for Interaction Datasets (BioGRID) using widely accepted benchmark protein complexes from the comprehensive resource of mammalian protein complexes (CORUM) and the comprehensive catalogue of yeast protein complexes (CYC2008). The outcomes demonstrate that our method can outperform the best-performing supervised, semi-supervised, and unsupervised approaches to detecting protein complexes.
Collapse
Affiliation(s)
- Tushar Ranjan Sahoo
- Bioinformatics Lab, Department of Computer Science, IIIT, Bhubaneswar, India.
| | - Sabyasachi Patra
- Bioinformatics Lab, Department of Computer Science, IIIT, Bhubaneswar, India.
| | - Swati Vipsita
- Bioinformatics Lab, Department of Computer Science, IIIT, Bhubaneswar, India.
| |
Collapse
|
40
|
Chen K, Zhang X, Zhou X, Mi B, Xiao Y, Zhou L, Wu Z, Wu L, Wang X. Privacy preserving federated learning for full heterogeneity. ISA TRANSACTIONS 2023; 141:73-83. [PMID: 37105888 DOI: 10.1016/j.isatra.2023.04.020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 04/01/2023] [Accepted: 04/14/2023] [Indexed: 06/19/2023]
Abstract
Federated learning is a novel distribute machine learning paradigm to support cooperative model training among multiple participant clients, where each client keeps its private data locally to protect its data privacy. However, in practical application domains, Federated learning still meets several heterogeneous challenges such data heterogeneity, model heterogeneity, and computation heterogeneity, significantly decreasing its global model performance. To the best of our knowledge, existing solutions only focus on one or two challenges in their heterogeneous settings. In this paper, to address the above challenges simultaneously, we present a novel solution called Full Heterogeneous Federated Learning (FHFL). Firstly, we propose a synthetic data generation approach to mitigate the Non-IID data heterogeneity problem. Secondly, we use knowledge distillation to learn from heterogeneous models of participant clients for model aggregation in the central server. Finally, we produce an opportunistic computation schedule strategy to exploit the idle computation resources for fast-computing clients. Experiment results on different datasets show that our FHFL method can achieve an excellent model training performance. We believe it will serve as a pioneer work for distributed model training among heterogeneous clients in Federated learning.
Collapse
Affiliation(s)
- Kongyang Chen
- Institute of Artificial Intelligence and Blockchain, Guangzhou University, China; Pazhou Lab, Guangzhou, China; Jiangsu Key Laboratory of Media Design and Software Technology, Jiangnan University, Wuxi, China
| | - Xiaoxue Zhang
- School of Computer Science and Cyber Engineering, Guangzhou University, China
| | - Xiuhua Zhou
- School of Computer Science and Cyber Engineering, Guangzhou University, China
| | - Bing Mi
- School of Public Finance and Taxation, Guangdong University of Finance and Economics, China
| | - Yatie Xiao
- School of Computer Science and Cyber Engineering, Guangzhou University, China
| | - Lei Zhou
- Department of Otorhinolaryngology-Head and Neck Surgery, Zhongshan Hospital Affiliated to Fudan University, China
| | - Zhen Wu
- Third Affiliated Hospital, Sun Yat-sen University, China
| | - Lin Wu
- Third Affiliated Hospital, Sun Yat-sen University, China.
| | - Xiaoying Wang
- Third Affiliated Hospital, Sun Yat-sen University, China.
| |
Collapse
|
41
|
Garge RK, Geck RC, Armstrong JO, Dunn B, Boutz DR, Battenhouse A, Leutert M, Dang V, Jiang P, Kwiatkowski D, Peiser T, McElroy H, Marcotte EM, Dunham MJ. Systematic Profiling of Ale Yeast Protein Dynamics across Fermentation and Repitching. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.21.558736. [PMID: 37790497 PMCID: PMC10543003 DOI: 10.1101/2023.09.21.558736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Studying the genetic and molecular characteristics of brewing yeast strains is crucial for understanding their domestication history and adaptations accumulated over time in fermentation environments, and for guiding optimizations to the brewing process itself. Saccharomyces cerevisiae (brewing yeast) is amongst the most profiled organisms on the planet, yet the temporal molecular changes that underlie industrial fermentation and beer brewing remain understudied. Here, we characterized the genomic makeup of a Saccharomyces cerevisiae ale yeast widely used in the production of Hefeweizen beers, and applied shotgun mass spectrometry to systematically measure the proteomic changes throughout two fermentation cycles which were separated by 14 rounds of serial repitching. The resulting brewing yeast proteomics resource includes 64,740 protein abundance measurements. We found that this strain possesses typical genetic characteristics of Saccharomyces cerevisiae ale strains and displayed progressive shifts in molecular processes during fermentation based on protein abundance changes. We observed protein abundance differences between early fermentation batches compared to those separated by 14 rounds of serial repitching. The observed abundance differences occurred mainly in proteins involved in the metabolism of ergosterol and isobutyraldehyde. Our systematic profiling serves as a starting point for deeper characterization of how the yeast proteome changes during commercial fermentations and additionally serves as a resource to guide fermentation protocols, strain handling, and engineering practices in commercial brewing and fermentation environments. Finally, we created a web interface (https://brewing-yeast-proteomics.ccbb.utexas.edu/) to serve as a valuable resource for yeast geneticists, brewers, and biochemists to provide insights into the global trends underlying commercial beer production.
Collapse
Affiliation(s)
- Riddhiman K. Garge
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, USA
| | - Renee C. Geck
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Joseph O. Armstrong
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Barbara Dunn
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | - Daniel R. Boutz
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, USA
- Houston Methodist Research Institute, Houston, Texas, USA
| | - Anna Battenhouse
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, USA
| | - Mario Leutert
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland
| | - Vy Dang
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, USA
| | - Pengyao Jiang
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | | | | | | | - Edward M. Marcotte
- Department of Molecular Biosciences, The University of Texas at Austin, Austin, Texas, USA
| | - Maitreya J. Dunham
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| |
Collapse
|
42
|
Han Y, Liu M, Wang Z. Key protein identification by integrating protein complex information and multi-biological features. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:18191-18206. [PMID: 38052554 DOI: 10.3934/mbe.2023808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Identifying key proteins based on protein-protein interaction networks has emerged as a prominent area of research in bioinformatics. However, current methods exhibit certain limitations, such as the omission of subcellular localization information and the disregard for the impact of topological structure noise on the reliability of key protein identification. Moreover, the influence of proteins outside a complex but interacting with proteins inside the complex on complex participation tends to be overlooked. Addressing these shortcomings, this paper presents a novel method for key protein identification that integrates protein complex information with multiple biological features. This approach offers a comprehensive evaluation of protein importance by considering subcellular localization centrality, topological centrality weighted by gene ontology (GO) similarity and complex participation centrality. Experimental results, including traditional statistical metrics, jackknife methodology metric and key protein overlap or difference, demonstrate that the proposed method not only achieves higher accuracy in identifying key proteins compared to nine classical methods but also exhibits robustness across diverse protein-protein interaction networks.
Collapse
Affiliation(s)
- Yongyin Han
- School of Computer Science and Technology, China University of Mining and Technology, China
- Xuzhou College of Industrial Technology, China
| | - Maolin Liu
- School of Computer Science and Technology, China University of Mining and Technology, China
| | - Zhixiao Wang
- School of Computer Science and Technology, China University of Mining and Technology, China
| |
Collapse
|
43
|
Chen H, Pelizzola M, Futschik A. Haplotype based testing for a better understanding of the selective architecture. BMC Bioinformatics 2023; 24:322. [PMID: 37633901 PMCID: PMC10463365 DOI: 10.1186/s12859-023-05437-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 08/03/2023] [Indexed: 08/28/2023] Open
Abstract
BACKGROUND The identification of genomic regions affected by selection is one of the most important goals in population genetics. If temporal data are available, allele frequency changes at SNP positions are often used for this purpose. Here we provide a new testing approach that uses haplotype frequencies instead of allele frequencies. RESULTS Using simulated data, we show that compared to SNP based test, our approach has higher power, especially when the number of candidate haplotypes is small or moderate. To improve power when the number of haplotypes is large, we investigate methods to combine them with a moderate number of haplotype subsets. Haplotype frequencies can often be recovered with less noise than SNP frequencies, especially under pool sequencing, giving our test an additional advantage. Furthermore, spurious outlier SNPs may lead to false positives, a problem usually not encountered when working with haplotypes. Post hoc tests for the number of selected haplotypes and for differences between their selection coefficients are also provided for a better understanding of the underlying selection dynamics. An application on a real data set further illustrates the performance benefits. CONCLUSIONS Due to less multiple testing correction and noise reduction, haplotype based testing is able to outperform SNP based tests in terms of power in most scenarios.
Collapse
Affiliation(s)
- Haoyu Chen
- University of Veterinary Medicine Vienna, Vienna, Austria
- Vienna Graduate School of Population Genetics, Vienna, Austria
| | | | | |
Collapse
|
44
|
Wacholder A, Parikh SB, Coelho NC, Acar O, Houghton C, Chou L, Carvunis AR. A vast evolutionarily transient translatome contributes to phenotype and fitness. Cell Syst 2023; 14:363-381.e8. [PMID: 37164009 PMCID: PMC10348077 DOI: 10.1016/j.cels.2023.04.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 01/30/2023] [Accepted: 04/06/2023] [Indexed: 05/12/2023]
Abstract
Translation is the process by which ribosomes synthesize proteins. Ribosome profiling recently revealed that many short sequences previously thought to be noncoding are pervasively translated. To identify protein-coding genes in this noncanonical translatome, we combine an integrative framework for extremely sensitive ribosome profiling analysis, iRibo, with high-powered selection inferences tailored for short sequences. We construct a reference translatome for Saccharomyces cerevisiae comprising 5,400 canonical and almost 19,000 noncanonical translated elements. Only 14 noncanonical elements were evolving under detectable purifying selection. A representative subset of translated elements lacking signatures of selection demonstrated involvement in processes including DNA repair, stress response, and post-transcriptional regulation. Our results suggest that most translated elements are not conserved protein-coding genes and contribute to genotype-phenotype relationships through fast-evolving molecular mechanisms.
Collapse
Affiliation(s)
- Aaron Wacholder
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Saurin Bipin Parikh
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Nelson Castilho Coelho
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Omer Acar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Carly Houghton
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Joint CMU-Pitt PhD Program in Computational Biology, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Lin Chou
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Integrative Systems Biology Program, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Anne-Ruxandra Carvunis
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA; Pittsburgh Center for Evolutionary Biology and Medicine, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA.
| |
Collapse
|
45
|
Wong ED, Miyasato SR, Aleksander S, Karra K, Nash RS, Skrzypek MS, Weng S, Engel SR, Cherry JM. Saccharomyces genome database update: server architecture, pan-genome nomenclature, and external resources. Genetics 2023; 224:iyac191. [PMID: 36607068 PMCID: PMC10158836 DOI: 10.1093/genetics/iyac191] [Citation(s) in RCA: 52] [Impact Index Per Article: 26.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 11/16/2022] [Accepted: 12/21/2022] [Indexed: 01/07/2023] Open
Abstract
As one of the first model organism knowledgebases, Saccharomyces Genome Database (SGD) has been supporting the scientific research community since 1993. As technologies and research evolve, so does SGD: from updates in software architecture, to curation of novel data types, to incorporation of data from, and collaboration with, other knowledgebases. We are continuing to make steps toward providing the community with an S. cerevisiae pan-genome. Here, we describe software upgrades, a new nomenclature system for genes not found in the reference strain, and additions to gene pages. With these improvements, we aim to remain a leading resource for students, researchers, and the broader scientific community.
Collapse
Affiliation(s)
- Edith D Wong
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Stuart R Miyasato
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Suzi Aleksander
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Kalpana Karra
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Robert S Nash
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Marek S Skrzypek
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Shuai Weng
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Stacia R Engel
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - J Michael Cherry
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
46
|
Lê-Bury P, Druart K, Savin C, Lechat P, Mas Fiol G, Matondo M, Bécavin C, Dussurget O, Pizarro-Cerdá J. Yersiniomics, a Multi-Omics Interactive Database for Yersinia Species. Microbiol Spectr 2023; 11:e0382622. [PMID: 36847572 PMCID: PMC10100798 DOI: 10.1128/spectrum.03826-22] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 01/26/2023] [Indexed: 03/01/2023] Open
Abstract
The genus Yersinia includes a large variety of nonpathogenic and life-threatening pathogenic bacteria, which cause a broad spectrum of diseases in humans and animals, such as plague, enteritis, Far East scarlet-like fever (FESLF), and enteric redmouth disease. Like most clinically relevant microorganisms, Yersinia spp. are currently subjected to intense multi-omics investigations whose numbers have increased extensively in recent years, generating massive amounts of data useful for diagnostic and therapeutic developments. The lack of a simple and centralized way to exploit these data led us to design Yersiniomics, a web-based platform allowing straightforward analysis of Yersinia omics data. Yersiniomics contains a curated multi-omics database at its core, gathering 200 genomic, 317 transcriptomic, and 62 proteomic data sets for Yersinia species. It integrates genomic, transcriptomic, and proteomic browsers, a genome viewer, and a heatmap viewer to navigate within genomes and experimental conditions. For streamlined access to structural and functional properties, it directly links each gene to GenBank, the Kyoto Encyclopedia of Genes and Genomes (KEGG), UniProt, InterPro, IntAct, and the Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) and each experiment to Gene Expression Omnibus (GEO), the European Nucleotide Archive (ENA), or the Proteomics Identifications Database (PRIDE). Yersiniomics provides a powerful tool for microbiologists to assist with investigations ranging from specific gene studies to systems biology studies. IMPORTANCE The expanding genus Yersinia is composed of multiple nonpathogenic species and a few pathogenic species, including the deadly etiologic agent of plague, Yersinia pestis. In 2 decades, the number of genomic, transcriptomic, and proteomic studies on Yersinia grew massively, delivering a wealth of data. We developed Yersiniomics, an interactive web-based platform, to centralize and analyze omics data sets on Yersinia species. The platform allows user-friendly navigation between genomic data, expression data, and experimental conditions. Yersiniomics will be a valuable tool to microbiologists.
Collapse
Affiliation(s)
- Pierre Lê-Bury
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
| | - Karen Druart
- Institut Pasteur, Université Paris Cité, CNRS USR2000, Mass Spectrometry for Biology Unit, Proteomic Platform, Paris, France
| | - Cyril Savin
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
- Institut Pasteur, Université Paris Cité, Yersinia National Reference Laboratory, WHO Collaborating Research & Reference Centre for Plague FRA-140, Paris, France
| | - Pierre Lechat
- Institut Pasteur, Université Paris Cité, ALPS, Bioinformatic Hub, Paris, France
| | - Guillem Mas Fiol
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
| | - Mariette Matondo
- Institut Pasteur, Université Paris Cité, CNRS USR2000, Mass Spectrometry for Biology Unit, Proteomic Platform, Paris, France
| | | | - Olivier Dussurget
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
| | - Javier Pizarro-Cerdá
- Institut Pasteur, Université Paris Cité, CNRS UMR6047, Yersinia Research Unit, Paris, France
- Institut Pasteur, Université Paris Cité, Yersinia National Reference Laboratory, WHO Collaborating Research & Reference Centre for Plague FRA-140, Paris, France
| |
Collapse
|
47
|
Chen H, Cai Y, Ji C, Selvaraj G, Wei D, Wu H. AdaPPI: identification of novel protein functional modules via adaptive graph convolution networks in a protein-protein interaction network. Brief Bioinform 2023; 24:bbac523. [PMID: 36526282 DOI: 10.1093/bib/bbac523] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 10/10/2022] [Accepted: 11/02/2022] [Indexed: 12/23/2022] Open
Abstract
Identifying unknown protein functional modules, such as protein complexes and biological pathways, from protein-protein interaction (PPI) networks, provides biologists with an opportunity to efficiently understand cellular function and organization. Finding complex nonlinear relationships in underlying functional modules may involve a long-chain of PPI and pose great challenges in a PPI network with an unevenly sparse and dense node distribution. To overcome these challenges, we propose AdaPPI, an adaptive convolution graph network in PPI networks to predict protein functional modules. We first suggest an attributed graph node presentation algorithm. It can effectively integrate protein gene ontology attributes and network topology, and adaptively aggregates low- or high-order graph structural information according to the node distribution by considering graph node smoothness. Based on the obtained node representations, core cliques and expansion algorithms are applied to find functional modules in PPI networks. Comprehensive performance evaluations and case studies indicate that the framework significantly outperforms state-of-the-art methods. We also presented potential functional modules based on their confidence.
Collapse
|
48
|
Chen S, Huang C, Wang L, Zhou S. A disease-related essential protein prediction model based on the transfer neural network. Front Genet 2023; 13:1087294. [PMID: 36685976 PMCID: PMC9845409 DOI: 10.3389/fgene.2022.1087294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Accepted: 12/14/2022] [Indexed: 01/06/2023] Open
Abstract
Essential proteins play important roles in the development and survival of organisms whose mutations are proven to be the drivers of common internal diseases having higher prevalence rates. Due to high costs of traditional biological experiments, an improved Transfer Neural Network (TNN) was designed to extract raw features from multiple biological information of proteins first, and then, based on the newly-constructed Transfer Neural Network, a novel computational model called TNNM was designed to infer essential proteins in this paper. Different from traditional Markov chain, since Transfer Neural Network adopted the gradient descent algorithm to automatically obtain the transition probability matrix, the prediction accuracy of TNNM was greatly improved. Moreover, additional antecedent memory coefficient and bias term were introduced in Transfer Neural Network, which further enhanced both the robustness and the non-linear expression ability of TNNM as well. Finally, in order to evaluate the identification performance of TNNM, intensive experiments have been executed based on two well-known public databases separately, and experimental results show that TNNM can achieve better performance than representative state-of-the-art prediction models in terms of both predictive accuracies and decline rate of accuracies. Therefore, TNNM may play an important role in key protein prediction in the future.
Collapse
Affiliation(s)
- Sisi Chen
- The First Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China
| | - Chiguo Huang
- Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China,*Correspondence: Chiguo Huang, ; Lei Wang, ; Shunxian Zhou,
| | - Lei Wang
- The First Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China,Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China,*Correspondence: Chiguo Huang, ; Lei Wang, ; Shunxian Zhou,
| | - Shunxian Zhou
- The First Hospital of Hunan University of Chinese Medicine, Changsha, Hunan, China,Big Data Innovation and Entrepreneurship Education Center of Hunan Province, Changsha University, Changsha, China,College of Information Science and Engineering, Hunan Women’s University, Changsha, Hunan, China,*Correspondence: Chiguo Huang, ; Lei Wang, ; Shunxian Zhou,
| |
Collapse
|
49
|
Xue X, Zhang W, Fan A. Comparative analysis of gene ontology-based semantic similarity measurements for the application of identifying essential proteins. PLoS One 2023; 18:e0284274. [PMID: 37083829 PMCID: PMC10121005 DOI: 10.1371/journal.pone.0284274] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Accepted: 03/28/2023] [Indexed: 04/22/2023] Open
Abstract
Identifying key proteins from protein-protein interaction (PPI) networks is one of the most fundamental and important tasks for computational biologists. However, the protein interactions obtained by high-throughput technology are characterized by a high false positive rate, which severely hinders the prediction accuracy of the current computational methods. In this paper, we propose a novel strategy to identify key proteins by constructing reliable PPI networks. Five Gene Ontology (GO)-based semantic similarity measurements (Jiang, Lin, Rel, Resnik, and Wang) are used to calculate the confidence scores for protein pairs under three annotation terms (Molecular function (MF), Biological process (BP), and Cellular component (CC)). The protein pairs with low similarity values are assumed to be low-confidence links, and the refined PPI networks are constructed by filtering the low-confidence links. Six topology-based centrality methods (the BC, DC, EC, NC, SC, and aveNC) are applied to test the performance of the measurements under the original network and refined network. We systematically compare the performance of the five semantic similarity metrics with the three GO annotation terms on four benchmark datasets, and the simulation results show that the performance of these centrality methods under refined PPI networks is relatively better than that under the original networks. Resnik with a BP annotation term performs best among all five metrics with the three annotation terms. These findings suggest the importance of semantic similarity metrics in measuring the reliability of the links between proteins and highlight the Resnik metric with the BP annotation term as a favourable choice.
Collapse
Affiliation(s)
- Xiaoli Xue
- School of Science, East China Jiaotong University, Nanchang, China
| | - Wei Zhang
- School of Science, East China Jiaotong University, Nanchang, China
| | - Anjing Fan
- School of Computer and Information Engineering, Anyang Normal University, Anyang, China
| |
Collapse
|
50
|
孙 玉, 刘 嘉, 孙 泽, 韩 建, 于 宁. [A generative adversarial network-based unsupervised domain adaptation method for magnetic resonance image segmentation]. SHENG WU YI XUE GONG CHENG XUE ZA ZHI = JOURNAL OF BIOMEDICAL ENGINEERING = SHENGWU YIXUE GONGCHENGXUE ZAZHI 2022; 39:1181-1188. [PMID: 36575088 PMCID: PMC9927195 DOI: 10.7507/1001-5515.202203009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Revised: 10/23/2022] [Indexed: 12/29/2022]
Abstract
Intelligent medical image segmentation methods have been rapidly developed and applied, while a significant challenge is domain shift. That is, the segmentation performance degrades due to distribution differences between the source domain and the target domain. This paper proposed an unsupervised end-to-end domain adaptation medical image segmentation method based on the generative adversarial network (GAN). A network training and adjustment model was designed, including segmentation and discriminant networks. In the segmentation network, the residual module was used as the basic module to increase feature reusability and reduce model optimization difficulty. Further, it learned cross-domain features at the image feature level with the help of the discriminant network and a combination of segmentation loss with adversarial loss. The discriminant network took the convolutional neural network and used the labels from the source domain, to distinguish whether the segmentation result of the generated network is from the source domain or the target domain. The whole training process was unsupervised. The proposed method was tested with experiments on a public dataset of knee magnetic resonance (MR) images and the clinical dataset from our cooperative hospital. With our method, the mean Dice similarity coefficient (DSC) of segmentation results increased by 2.52% and 6.10% to the classical feature level and image level domain adaptive method. The proposed method effectively improves the domain adaptive ability of the segmentation method, significantly improves the segmentation accuracy of the tibia and femur, and can better solve the domain transfer problem in MR image segmentation.
Collapse
Affiliation(s)
- 玉波 孙
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
- 南开大学 天津市智能机器人技术重点实验室(天津 300350)Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin 300350, P. R. China
| | - 嘉男 刘
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
- 南开大学 天津市智能机器人技术重点实验室(天津 300350)Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin 300350, P. R. China
| | - 泽文 孙
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
| | - 建达 韩
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
- 南开大学 天津市智能机器人技术重点实验室(天津 300350)Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin 300350, P. R. China
- 北京大学第三医院 运动医学研究所(北京 100083)Institute of Sports Medicine, Peking University Third Hospital, Beijing 100083, P. R. China
| | - 宁波 于
- 南开大学 人工智能学院(天津 300350)College of Artificial Intelligence, Nankai University, Tianjin 300350, P. R. China
- 南开大学 天津市智能机器人技术重点实验室(天津 300350)Tianjin Key Laboratory of Intelligent Robotics, Nankai University, Tianjin 300350, P. R. China
- 北京大学第三医院 运动医学研究所(北京 100083)Institute of Sports Medicine, Peking University Third Hospital, Beijing 100083, P. R. China
| |
Collapse
|