1
|
Chou JC, Dassama LMK. Lipid Trafficking in Diverse Bacteria. Acc Chem Res 2025; 58:36-46. [PMID: 39680024 PMCID: PMC11713862 DOI: 10.1021/acs.accounts.4c00540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2024] [Revised: 11/27/2024] [Accepted: 12/02/2024] [Indexed: 12/17/2024]
Abstract
Lipids are essential for life and serve as cell envelope components, signaling molecules, and nutrients. For lipids to achieve their required functions, they need to be correctly localized. This requires the action of transporter proteins and an energy source. The current understanding of bacterial lipid transporters is limited to a few classes. Given the diversity of lipid species and the predicted existence of specific lipid transporters, many more transporters await discovery and characterization. These proteins could be prime targets for modulators that control bacterial cell proliferation and pathogenesis. One overarching goal of our research is to understand the molecular mechanisms of bacterial metabolite trafficking, including lipids, and to leverage that understanding to identify or engineer inhibitory ligands. In recent years, our work has revealed two novel lipid transport systems in bacteria: bacterial sterol transporters (Bst) A, B, and C in Methylococcus capsulatus and the TatT proteins in Enhygromyxa salina and Treponema pallidum. Both systems are composed of transporters bioinformatically identified as being involved in the transport of other metabolites, but substrates were never revealed. However, the genetic colocalization of the genes encoding BstABC with sterol biosynthetic enzymes in M. capsulatus suggested that they might recognize sterols as substrates. Also, homologues of TatTs are present in diverse bacteria but are overrepresented in bacteria deficient in de novo lipid synthesis or residing in nutrient-poor environments; we reasoned that these proteins might facilitate the transport of lipids. Our efforts to reveal the substrate scope of two TatT proteins revealed their engagement with long-chain fatty acids. Enabling the discovery of the BstABC system and the TatT proteins were bioinformatic analyses, quantitative measurements of protein-ligand equilibrium affinities, and high-resolution structural studies that provided remarkable insights into ligand binding cavities and the structural basis for ligand interaction. These approaches, in particular our bioinformatics and structural work, highlighted the diversity of protein sequence and structures amenable to lipid engagement. These observations allowed the hypothesis that lipid handling proteins, in general and especially so in the bacterial domain, can have diverse amino acid compositions and three-dimensional structures. As such, bioinformatics geared at identifying them in poorly characterized genomes is likely to miss many candidates that diverge from well-characterized family members. This realization spurred efforts to understand the unifying features in all of the lipid handling proteins we have characterized to date. To do this, we inspected the ligand binding sites of the proteins: they were remarkably hydrophobic and sometimes displayed a dichotomy of hydrophobic and hydrophilic amino acids, akin to the ligands that they accommodate in those cavities. Because of this, we reasoned that the physicochemical features of ligand binding cavities could be accurate predictors of a protein's propensity to bind lipids. This finding was leveraged to create structure-based lipid-interacting pocket predictor (SLiPP), a machine-learning algorithm capable of identifying ligand cavities with physico-chemical features consistent with those of known lipid binding sites. SLiPP is especially useful in poorly annotated genomes (such as with bacterial pathogens), where it could reveal candidate proteins to be targeted for the development of antimicrobials.
Collapse
Affiliation(s)
- Jonathan
Chiu-Chun Chou
- Department
of Chemistry and Sarafan ChEM-H Institute, Stanford University, Stanford, California 94305, United States
| | - Laura M. K. Dassama
- Department
of Chemistry and Sarafan ChEM-H Institute, Stanford University, Stanford, California 94305, United States
- Department
of Microbiology and Immunology, Stanford
School of Medicine, Stanford, California 94305, United States
| |
Collapse
|
2
|
Giri SJ, Ibtehaz N, Kihara D. GO2Sum: generating human-readable functional summary of proteins from GO terms. NPJ Syst Biol Appl 2024; 10:29. [PMID: 38491038 PMCID: PMC10943200 DOI: 10.1038/s41540-024-00358-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 03/05/2024] [Indexed: 03/18/2024] Open
Abstract
Understanding the biological functions of proteins is of fundamental importance in modern biology. To represent a function of proteins, Gene Ontology (GO), a controlled vocabulary, is frequently used, because it is easy to handle by computer programs avoiding open-ended text interpretation. Particularly, the majority of current protein function prediction methods rely on GO terms. However, the extensive list of GO terms that describe a protein function can pose challenges for biologists when it comes to interpretation. In response to this issue, we developed GO2Sum (Gene Ontology terms Summarizer), a model that takes a set of GO terms as input and generates a human-readable summary using the T5 large language model. GO2Sum was developed by fine-tuning T5 on GO term assignments and free-text function descriptions for UniProt entries, enabling it to recreate function descriptions by concatenating GO term descriptions. Our results demonstrated that GO2Sum significantly outperforms the original T5 model that was trained on the entire web corpus in generating Function, Subunit Structure, and Pathway paragraphs for UniProt entries.
Collapse
Affiliation(s)
| | - Nabil Ibtehaz
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
3
|
Chou JCC, Decosto CM, Chatterjee P, Dassama LMK. Rapid proteome-wide prediction of lipid-interacting proteins through ligand-guided structural genomics. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.26.577452. [PMID: 38352308 PMCID: PMC10862712 DOI: 10.1101/2024.01.26.577452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Lipids are primary metabolites that play essential roles in multiple cellular pathways. Alterations in lipid metabolism and transport are associated with infectious diseases and cancers. As such, proteins involved in lipid synthesis, trafficking, and modification, are targets for therapeutic intervention. The ability to rapidly detect these proteins can accelerate their biochemical and structural characterization. However, it remains challenging to identify lipid binding motifs in proteins due to a lack of conservation at the amino acids level. Therefore, new bioinformatic tools that can detect conserved features in lipid binding sites are necessary. Here, we present Structure-based Lipid-interacting Pocket Predictor (SLiPP), a structural bioinformatics algorithm that uses machine learning to detect protein cavities capable of binding to lipids in experimental and AlphaFold-predicted protein structures. SLiPP, which can be used at proteome-wide scales, predicts lipid binding pockets with an accuracy of 96.8% and a F1 score of 86.9%. Our analyses revealed that the algorithm relies on hydrophobicity-related features to distinguish lipid binding pockets from those that bind to other ligands. Use of the algorithm to detect lipid binding proteins in the proteomes of various bacteria, yeast, and human have produced hits annotated or verified as lipid binding proteins, and many other uncharacterized proteins whose functions are not discernable from sequence alone. Because of its ability to identify novel lipid binding proteins, SLiPP can spur the discovery of new lipid metabolic and trafficking pathways that can be targeted for therapeutic development.
Collapse
Affiliation(s)
- Jonathan Chiu-Chun Chou
- Department of Chemistry and Sarafan ChEM-H Institute, Stanford University, Stanford, CA 94305
| | - Cassandra M. Decosto
- Department of Chemistry and Sarafan ChEM-H Institute, Stanford University, Stanford, CA 94305
| | - Poulami Chatterjee
- Department of Chemistry and Sarafan ChEM-H Institute, Stanford University, Stanford, CA 94305
| | - Laura M. K. Dassama
- Department of Chemistry and Sarafan ChEM-H Institute, Stanford University, Stanford, CA 94305
- Department of Microbiology and Immunology, Stanford School of Medicine, Stanford, CA 94305
| |
Collapse
|
4
|
Huang Z, Cai Z, Zhang J, Gu Y, Wang J, Yang J, Lv G, Yang C, Zhang Y, Ji C, Jiang S. Integrating proteomics and metabolomics to elucidate the molecular network regulating of inosine monophosphate-specific deposition in Jingyuan chicken. Poult Sci 2023; 102:103118. [PMID: 37862870 PMCID: PMC10590753 DOI: 10.1016/j.psj.2023.103118] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 09/10/2023] [Accepted: 09/12/2023] [Indexed: 10/22/2023] Open
Abstract
Inosine monophosphate (IMP) plays a significant role in meat taste, yet the molecular mechanisms controlling IMP deposition in muscle tissues still require elucidation. The present study systematically and comprehensively explores the molecular network governing IMP deposition in different regions of Jingyuan chicken muscle. Two muscle groups, the breast and leg, were examined as test materials. Using nontargeted metabolomic sequencing, we screened and identified 20 metabolites that regulate IMP-specific deposition. We maintained regular author and institution formatting, used clear, objective, and value-neutral language, and avoided biased or emotional language. We followed a consistent footnote style and formatting features and used precise word choice with technical terms where appropriate. Out of these, 5 were identified as significant contributors to the regulation of IMP deposition. We explained technical term abbreviations when first used and ensured a logical flow of information with causal connections between statements. The results indicate that PGM1, a key enzyme involved in synthesis, is higher in the breast muscle compared to the leg muscle, which may provide an explanation for the increased deposition of IMP in the breast muscle. We aimed for a clear structure with logical progression, avoided filler words, and ensured grammatical correctness. The activity of key enzymes (PKM2, AK1, AMPD1) involved in this process was higher in the breast muscle than in the leg muscle. In the case of IMP degradation metabolism, the activity of its participating enzyme (PurH) was lower in the breast muscle than in the leg muscle. These findings suggest that the increased deposition of IMP in Jingyuan chickens' breast muscle may result from elevated metabolism and reduced catabolism of key metabolites. In summary, a metaomic strategy was utilized to assess the molecular network regulation mechanism of IMP-specific deposition in various segments of Jingyuan chicken. These findings provide insight into genetic improvement and molecular breeding of meat quality traits for top-notch broilers.
Collapse
Affiliation(s)
- Zengwen Huang
- Agriculture College, Ningxia University, Ningxia, Yinchuan 750021, China; College of Animal Science, Xichang University, Sichuan, Xichang 615012, China; Xinjiang Taikun Group Co., Ltd., Xinjiang, Changji 831100, China
| | - Zhengyun Cai
- Agriculture College, Ningxia University, Ningxia, Yinchuan 750021, China
| | - Juan Zhang
- Agriculture College, Ningxia University, Ningxia, Yinchuan 750021, China.
| | - Yaling Gu
- Agriculture College, Ningxia University, Ningxia, Yinchuan 750021, China
| | - Jing Wang
- College of Animal Science, Xichang University, Sichuan, Xichang 615012, China
| | - Jinzeng Yang
- Department of Human Nutrition, Food & Animal Sciences, College of Tropical Agriculture and Human Resources, University of Hawaii at Manoa, Manoa, HI 96822
| | - Gang Lv
- Xinjiang Taikun Group Co., Ltd., Xinjiang, Changji 831100, China
| | - Chaoyun Yang
- College of Animal Science, Xichang University, Sichuan, Xichang 615012, China
| | - Yi Zhang
- College of Animal Science, Xichang University, Sichuan, Xichang 615012, China
| | - Chen Ji
- College of Animal Science, Xichang University, Sichuan, Xichang 615012, China
| | - Shengwang Jiang
- College of Animal Science, Xichang University, Sichuan, Xichang 615012, China
| |
Collapse
|
5
|
Nussinov R, Liu Y, Zhang W, Jang H. Cell phenotypes can be predicted from propensities of protein conformations. Curr Opin Struct Biol 2023; 83:102722. [PMID: 37871498 PMCID: PMC10841533 DOI: 10.1016/j.sbi.2023.102722] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/26/2023] [Accepted: 09/27/2023] [Indexed: 10/25/2023]
Abstract
Proteins exist as dynamic conformational ensembles. Here we suggest that the propensities of the conformations can be predictors of cell function. The conformational states that the molecules preferentially visit can be viewed as phenotypic determinants, and their mutations work by altering the relative propensities, thus the cell phenotype. Our examples include (i) inactive state variants harboring cancer driver mutations that present active state-like conformational features, as in K-Ras4BG12V compared to other K-Ras4BG12X mutations; (ii) mutants of the same protein presenting vastly different phenotypic and clinical profiles: cancer and neurodevelopmental disorders; (iii) alterations in the occupancies of the conformational (sub)states influencing enzyme reactivity. Thus, protein conformational propensities can determine cell fate. They can also suggest the allosteric drugs efficiency.
Collapse
Affiliation(s)
- Ruth Nussinov
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA; Department of Human Molecular Genetics and Biochemistry, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel; Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA.
| | - Yonglan Liu
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA
| | - Wengang Zhang
- Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA
| | - Hyunbum Jang
- Computational Structural Biology Section, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA; Cancer Innovation Laboratory, National Cancer Institute, Frederick, MD 21702, USA
| |
Collapse
|
6
|
Giri SJ, Ibtehaz N, Kihara D. GO2Sum: Generating Human Readable Functional Summary of Proteins from GO Terms. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.10.566665. [PMID: 38014080 PMCID: PMC10680659 DOI: 10.1101/2023.11.10.566665] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Understanding the biological functions of proteins is of fundamental importance in modern biology. To represent function of proteins, Gene Ontology (GO), a controlled vocabulary, is frequently used, because it is easy to handle by computer programs avoiding open-ended text interpretation. Particularly, the majority of current protein function prediction methods rely on GO terms. However, the extensive list of GO terms that describe a protein function can pose challenges for biologists when it comes to interpretation. In response to this issue, we developed GO2Sum (Gene Ontology terms Summarizer), a model that takes a set of GO terms as input and generates a human-readable summary using the T5 large language model. GO2Sum was developed by fine-tuning T5 on GO term assignments and free-text function descriptions for UniProt entries, enabling it to recreate function descriptions by concatenating GO term descriptions. Our results demonstrated that GO2Sum significantly outperforms the original T5 model that was trained on the entire web corpus in generating Function, Subunit Structure, and Pathway paragraphs for UniProt entries.
Collapse
Affiliation(s)
| | - Nabil Ibtehaz
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| |
Collapse
|
7
|
Ibtehaz N, Kagaya Y, Kihara D. Domain-PFP allows protein function prediction using function-aware domain embedding representations. Commun Biol 2023; 6:1103. [PMID: 37907681 PMCID: PMC10618451 DOI: 10.1038/s42003-023-05476-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 10/17/2023] [Indexed: 11/02/2023] Open
Abstract
Domains are functional and structural units of proteins that govern various biological functions performed by the proteins. Therefore, the characterization of domains in a protein can serve as a proper functional representation of proteins. Here, we employ a self-supervised protocol to derive functionally consistent representations for domains by learning domain-Gene Ontology (GO) co-occurrences and associations. The domain embeddings we constructed turned out to be effective in performing actual function prediction tasks. Extensive evaluations showed that protein representations using the domain embeddings are superior to those of large-scale protein language models in GO prediction tasks. Moreover, the new function prediction method built on the domain embeddings, named Domain-PFP, substantially outperformed the state-of-the-art function predictors. Additionally, Domain-PFP demonstrated competitive performance in the CAFA3 evaluation, achieving overall the best performance among the top teams that participated in the assessment.
Collapse
Affiliation(s)
- Nabil Ibtehaz
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, USA.
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.
| |
Collapse
|
8
|
Ibtehaz N, Kagaya Y, Kihara D. Domain-PFP: Protein Function Prediction Using Function-Aware Domain Embedding Representations. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.23.554486. [PMID: 37662252 PMCID: PMC10473699 DOI: 10.1101/2023.08.23.554486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/05/2023]
Abstract
Domains are functional and structural units of proteins that govern various biological functions performed by the proteins. Therefore, the characterization of domains in a protein can serve as a proper functional representation of proteins. Here, we employ a self-supervised protocol to derive functionally consistent representations for domains by learning domain-Gene Ontology (GO) co-occurrences and associations. The domain embeddings we constructed turned out to be effective in performing actual function prediction tasks. Extensive evaluations showed that protein representations using the domain embeddings are superior to those of large-scale protein language models in GO prediction tasks. Moreover, the new function prediction method built on the domain embeddings, named Domain-PFP, significantly outperformed the state-of-the-art function predictors. Additionally, Domain-PFP demonstrated competitive performance in the CAFA3 evaluation, achieving overall the best performance among the top teams that participated in the assessment.
Collapse
Affiliation(s)
- Nabil Ibtehaz
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
| | - Yuki Kagaya
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, United States
- Department of Biological Sciences, Purdue University, West Lafayette, IN, United States
| |
Collapse
|