1
|
Meshcheryakova OV, Bogdanov MA, Efimov AV. Relationship between thermal stability of collagens and the fraction of hydrophobic residues in their molecules. J Struct Biol 2024; 216:108114. [PMID: 39094716 DOI: 10.1016/j.jsb.2024.108114] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Revised: 07/18/2024] [Accepted: 07/30/2024] [Indexed: 08/04/2024]
Abstract
In this study, a database of the thermal stability of collagens and their synthetic analogues has been compiled taking into account literature sources. In total, our database includes 1200 records. As a result of a comparative theoretical analysis of the collected experimental data, the relationship between the melting temperature (Tm) or denaturation temperature (Td) of collagens and the fraction of hydrophobic residues (f) in their molecules has been established. It is shown that this relationship is linear: the larger the f value, the higher the denaturation or melting temperature of a given collagen.
Collapse
Affiliation(s)
- Olga V Meshcheryakova
- Centre for Biomedical Research of the Karelian Research Centre of the Russian Academy of Sciences, Pushkinskaya St., 11, Petrozavodsk 185910 Russia.
| | - Maxim A Bogdanov
- Laboratory of Intellectual Services and Applications, ITMO University, Kronverksky Pr., 49, St.Petersburg 197101, Russia
| | - Alexander V Efimov
- Institute of Protein Research, Russian Academy of Sciences, Institutskaya St., 4, Pushchino, Moscow Region 142290 Russia.
| |
Collapse
|
2
|
Buehler MJ. Generative Retrieval-Augmented Ontologic Graph and Multiagent Strategies for Interpretive Large Language Model-Based Materials Design. ACS ENGINEERING AU 2024; 4:241-277. [PMID: 38646516 PMCID: PMC11027160 DOI: 10.1021/acsengineeringau.3c00058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/24/2023] [Revised: 12/06/2023] [Accepted: 12/07/2023] [Indexed: 04/23/2024]
Abstract
Transformer neural networks show promising capabilities, in particular for uses in materials analysis, design, and manufacturing, including their capacity to work effectively with human language, symbols, code, and numerical data. Here, we explore the use of large language models (LLMs) as a tool that can support engineering analysis of materials, applied to retrieving key information about subject areas, developing research hypotheses, discovery of mechanistic relationships across disparate areas of knowledge, and writing and executing simulation codes for active knowledge generation based on physical ground truths. Moreover, when used as sets of AI agents with specific features, capabilities, and instructions, LLMs can provide powerful problem-solution strategies for applications in analysis and design problems. Our experiments focus on using a fine-tuned model, MechGPT, developed based on training data in the mechanics of materials domain. We first affirm how fine-tuning endows LLMs with a reasonable understanding of subject area knowledge. However, when queried outside the context of learned matter, LLMs can have difficulty recalling correct information and may hallucinate. We show how this can be addressed using retrieval-augmented Ontological Knowledge Graph strategies. The graph-based strategy helps us not only to discern how the model understands what concepts are important but also how they are related, which significantly improves generative performance and also naturally allows for injection of new and augmented data sources into generative AI algorithms. We find that the additional feature of relatedness provides advantages over regular retrieval augmentation approaches and not only improves LLM performance but also provides mechanistic insights for exploration of a material design process. Illustrated for a use case of relating distinct areas of knowledge, here, music and proteins, such strategies can also provide an interpretable graph structure with rich information at the node, edge, and subgraph level that provides specific insights into mechanisms and relationships. We discuss other approaches to improve generative qualities, including nonlinear sampling strategies and agent-based modeling that offer enhancements over single-shot generations, whereby LLMs are used to both generate content and assess content against an objective target. Examples provided include complex question answering, code generation, and execution in the context of automated force-field development from actively learned density functional theory (DFT) modeling and data analysis.
Collapse
Affiliation(s)
- Markus J. Buehler
- Laboratory
for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, United States
- Department
of Civil and Environmental Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, United States
- Department
of Mechanical Engineering, Massachusetts
Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, United States
- Center
for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, United States
| |
Collapse
|
3
|
Jia C, Li H, Yang Z, Xu R, Wang L, Li H. From medical strategy to foodborne prophylactic strategy: Stabilizing dental collagen with aloin. Food Sci Nutr 2024; 12:830-842. [PMID: 38370038 PMCID: PMC10867467 DOI: 10.1002/fsn3.3795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 10/10/2023] [Accepted: 10/13/2023] [Indexed: 02/20/2024] Open
Abstract
Infectious oral diseases are longstanding global public health concerns. However, traditional medical approaches to address these diseases are costly, traumatic, and prone to relapse. Here, we propose a foodborne prophylactic strategy using aloin to safeguard dental collagen. The effect of aloin on the stability of dental collagen was evaluated by treating dentin with a solution containing aloin (0.1 mg/mL) for 2 min. This concentration is comparable to the natural aloin content of edible aloe. Furthermore, we investigated the mechanisms underlying the interactions between aloin and dentin collagen. Our findings, obtained through fluorescence spectroscopy, attenuated total reflection Fourier transform infrared spectroscopy, Gaussian peak fitting, circular dichroism spectroscopy, and X-ray diffraction, revealed that aloin interacts with dental collagen through noncovalent bonding, specifically hydrogen bonding in situ. This interaction leads to a reduction in the distance between molecules and an increase in the proportion of stable α-helical chains in the dental collagen. The ultimate tensile strength and thermogravimetric analysis demonstrated that dental collagen treated with aloin exhibited improved mechanical strength and thermostability. Additionally, the release of hydroxyproline, cross-linked carboxy-terminal telopeptide of type I collagen, and C-terminal cross-linked telopeptide of type I collagen, along with weight loss, indicated an enhancement in the enzymatic stability of dental collagen. These findings suggest that aloin administration could be a daily, nondestructive, and cost-effective strategy for managing infectious oral diseases.
Collapse
Affiliation(s)
- Chongzhi Jia
- Department of Stomatology, The First Medical CenterChinese PLA General HospitalBeijingChina
| | - Hua Li
- Department of Stomatology, The First Medical CenterChinese PLA General HospitalBeijingChina
| | - Zhongliang Yang
- Department of Stomatology, The First Medical CenterChinese PLA General HospitalBeijingChina
| | - Rongchen Xu
- Department of Stomatology, The First Medical CenterChinese PLA General HospitalBeijingChina
- Department of Stomatology, The Third Medical CenterChinese PLA General HospitalBeijingChina
| | - Lijun Wang
- Department of Stomatology, The Third Medical CenterChinese PLA General HospitalBeijingChina
| | - Hongbo Li
- Department of Stomatology, The First Medical CenterChinese PLA General HospitalBeijingChina
| |
Collapse
|
4
|
Graham JJ, Keten S. Increase in Charge and Density Improves the Strength and Toughness of Mussel Foot Protein 5 Inspired Protein Materials. ACS Biomater Sci Eng 2023; 9:4662-4672. [PMID: 37417954 DOI: 10.1021/acsbiomaterials.3c00088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023]
Abstract
Mussel foot protein 5 (fp5) found in the adhesive byssal plaque of Mediterranean mussel Mytilus galloprovincialis exhibits exceptional underwater adhesion to diverse surfaces to the extent that adhesion strength typically exceeds the cohesive strength of the plaque. While sequence effects such as presence of charged residues, metal ion coordination, and high catechol content have been identified to govern fp5's interaction with surfaces, molecular contributors to its cohesive strength remain to be fully understood. Addressing this issue is critical for designing mussel-inspired sequences for new adhesives and biomaterials enabled by synthetic biology. Here we carry out all-atom molecular dynamics simulations on hydrated model fp5 biopolymer melts to understand how sequence features such as tyrosine and charge content affect packing density and inter-residue and ionic interaction strengths and consequently influence the cohesive strength and toughness. Systematic serine (S) substitutions for lysine (K), arginine (R) and tyrosine (Y) residues reveal that Y to S substitution surprisingly results in improvement of cohesive strength due to densification of the material by removal of steric hindrances, whereas the removal of charge in K and R to S substitutions has a detrimental impact on strength and toughness as it reduces cohesive interactions facilitated by electrostatic interactions. Additionally, melts formed from split fp5 sequences with only C or N terminal halves show distinct mechanical responses that further illustrate the role of charge. Our findings provide new insights for designing materials that could potentially surpass the performance of existing biomolecular and bioinspired adhesives, specifically by tailoring sequences for balancing charge and excluded volume effects.
Collapse
Affiliation(s)
- Jacob J Graham
- Northwestern University, Department of Mechanical Engineering, 2145 Sheridan Road, Evanston, Illinois 60208, United States
| | - Sinan Keten
- Northwestern University, Department of Mechanical Engineering, 2145 Sheridan Road, Evanston, Illinois 60208, United States
- Northwestern University, Department of Civil and Environmental Engineering, 2145 Sheridan Road, Evanston, Illinois 60208, United States
| |
Collapse
|
5
|
Buehler MJ. Emerging trends in multi-modal multi-dimensional biomechanical materials research. J Mech Behav Biomed Mater 2023; 141:105754. [PMID: 36906507 DOI: 10.1016/j.jmbbm.2023.105754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Affiliation(s)
- Markus J Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA, 02139, USA; Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, MA, 02139, USA.
| |
Collapse
|
6
|
Kim Y, Yoon T, Park WB, Na S. Predicting mechanical properties of silk from its amino acid sequences via machine learning. J Mech Behav Biomed Mater 2023; 140:105739. [PMID: 36871478 DOI: 10.1016/j.jmbbm.2023.105739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 02/12/2023] [Accepted: 02/21/2023] [Indexed: 02/25/2023]
Abstract
The silk fiber is increasingly being sought for its superior mechanical properties, biocompatibility, and eco-friendliness, making it promising as a base material for various applications. One of the characteristics of protein fibers, such as silk, is that their mechanical properties are significantly dependent on the amino acid sequence. Numerous studies have been conducted to determine the specific relationship between the amino acid sequence of silk and its mechanical properties. Still, the relationship between the amino acid sequence of silk and its mechanical properties is yet to be clarified. Other fields have adopted machine learning (ML) to establish a relationship between the inputs, such as the ratio of different input material compositions and the resulting mechanical properties. We have proposed a method to convert the amino acid sequence into numerical values for input and succeeded in predicting the mechanical properties of silk from its amino acid sequences. Our study sheds light on predicting mechanical properties of silk fiber from respective amino acid sequences.
Collapse
|
7
|
Unraveling the molecular mechanism of collagen flexibility during physiological warmup using molecular dynamics simulation and machine learning. Comput Struct Biotechnol J 2023; 21:1630-1638. [PMID: 36860343 PMCID: PMC9969283 DOI: 10.1016/j.csbj.2023.02.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 02/08/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023] Open
Abstract
Physiological warmup plays an important role in reducing the injury risk in different sports. In response to the associated temperature increase, the muscle and tendon soften and become easily stretched. In this study, we focused on type I collagen, the main component of the Achilles tendon, to unveil the molecular mechanism of collagen flexibility upon slight heating and to develop a model to predict the strain of collagen sequences. We used molecular dynamics approaches to simulate the molecular structures and mechanical behavior of the gap and overlap regions in type I collagen at 307 K, 310 K, and 313 K. The results showed that the molecular model in the overlap region is more sensitive to temperature increases. Upon increasing the temperature by 3 degrees Celsius, the end-to-end distance and Young's modulus of the overlap region decreased by 5% and 29.4%, respectively. The overlap region became more flexible than the gap region at higher temperatures. GAP-GPA and GNK-GSK triplets are critical for providing molecular flexibility upon heating. A machine learning model developed from the molecular dynamics simulation results showed good performance in predicting the strain of collagen sequences at a physiological warmup temperature. The strain-predictive model could be applied to future collagen designs to obtain desirable temperature-dependent mechanical properties.
Collapse
|
8
|
Fiala T, Barros EP, Heeb R, Riniker S, Wennemers H. Predicting Collagen Triple Helix Stability through Additive Effects of Terminal Residues and Caps. Angew Chem Int Ed Engl 2023; 62:e202214728. [PMID: 36409045 PMCID: PMC10108146 DOI: 10.1002/anie.202214728] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Revised: 11/20/2022] [Accepted: 11/21/2022] [Indexed: 11/23/2022]
Abstract
Collagen model peptides (CMPs) consisting of proline-(2S,4R)-hydroxyproline-glycine (POG) repeats have provided a breadth of knowledge of the triple helical structure of collagen, the most abundant protein in mammals. Predictive tools for triple helix stability have, however, lagged behind since the effect of CMPs with different frames ([POG]n , [OGP]n , or [GPO]n ) and capped or uncapped termini have so far been underestimated. Here, we elucidated the impact of the frame, terminal functional group and its charge on the stability of collagen triple helices. Combined experimental and theoretical studies with frame-shifted, capped and uncapped CMPs revealed that electrostatic interactions, strand preorganization, interstrand H-bonding, and steric repulsion at the termini contribute to triple helix stability. We show that these individual contributions are additive and allow for the prediction of the melting temperatures of CMP trimers.
Collapse
Affiliation(s)
- Tomas Fiala
- Laboratory of Organic Chemistry, ETH Zürich, Vladimir-Prelog-Weg 3, 8093, Zürich, Switzerland
| | - Emilia P Barros
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093, Zürich, Switzerland
| | - Rahel Heeb
- Laboratory of Organic Chemistry, ETH Zürich, Vladimir-Prelog-Weg 3, 8093, Zürich, Switzerland
| | - Sereina Riniker
- Laboratory of Physical Chemistry, ETH Zürich, Vladimir-Prelog-Weg 2, 8093, Zürich, Switzerland
| | - Helma Wennemers
- Laboratory of Organic Chemistry, ETH Zürich, Vladimir-Prelog-Weg 3, 8093, Zürich, Switzerland
| |
Collapse
|
9
|
Li G, Buric F, Zrimec J, Viknander S, Nielsen J, Zelezniak A, Engqvist MKM. Learning deep representations of enzyme thermal adaptation. Protein Sci 2022; 31:e4480. [PMID: 36261883 PMCID: PMC9679980 DOI: 10.1002/pro.4480] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 09/02/2022] [Accepted: 10/15/2022] [Indexed: 12/14/2022]
Abstract
Temperature is a fundamental environmental factor that shapes the evolution of organisms. Learning thermal determinants of protein sequences in evolution thus has profound significance for basic biology, drug discovery, and protein engineering. Here, we use a data set of over 3 million BRENDA enzymes labeled with optimal growth temperatures (OGTs) of their source organisms to train a deep neural network model (DeepET). The protein-temperature representations learned by DeepET provide a temperature-related statistical summary of protein sequences and capture structural properties that affect thermal stability. For prediction of enzyme optimal catalytic temperatures and protein melting temperatures via a transfer learning approach, our DeepET model outperforms classical regression models trained on rationally designed features and other deep-learning-based representations. DeepET thus holds promise for understanding enzyme thermal adaptation and guiding the engineering of thermostable enzymes.
Collapse
Affiliation(s)
- Gang Li
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Filip Buric
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Jan Zrimec
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- Department of Biotechnology and Systems BiologyNational Institute of BiologyLjubljanaSlovenia
| | - Sandra Viknander
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Jens Nielsen
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- BioInnovation InstituteCopenhagen NDenmark
| | - Aleksej Zelezniak
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- Life Sciences CentreInstitute of Biotechnology, Vilnius UniversityVilniusLithuania
- Randall Centre for Cell & Molecular BiophysicsKing's College London, New Hunt's House, Guy's Campus, SE1 1ULLondonUK
| | - Martin K. M. Engqvist
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- Enginzyme ABStockholmSweden
| |
Collapse
|
10
|
Discovering design principles of collagen molecular stability using a genetic algorithm, deep learning, and experimental validation. Proc Natl Acad Sci U S A 2022; 119:e2209524119. [PMID: 36161946 DOI: 10.1073/pnas.2209524119] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Collagen is the most abundant structural protein in humans, providing crucial mechanical properties, including high strength and toughness, in tissues. Collagen-based biomaterials are, therefore, used for tissue repair and regeneration. Utilizing collagen effectively during materials processing ex vivo and subsequent function in vivo requires stability over wide temperature ranges to avoid denaturation and loss of structure, measured as melting temperature (Tm). Although significant research has been conducted on understanding how collagen primary amino acid sequences correspond to Tm values, a robust framework to facilitate the design of collagen sequences with specific Tm remains a challenge. Here, we develop a general model using a genetic algorithm within a deep learning framework to design collagen sequences with specific Tm values. We report 1,000 de novo collagen sequences, and we show that we can efficiently use this model to generate collagen sequences and verify their Tm values using both experimental and computational methods. We find that the model accurately predicts Tm values within a few degrees centigrade. Further, using this model, we conduct a high-throughput study to identify the most frequently occurring collagen triplets that can be directly incorporated into collagen. We further discovered that the number of hydrogen bonds within collagen calculated with molecular dynamics (MD) is directly correlated to the experimental measurement of triple-helical quality. Ultimately, we see this work as a critical step to helping researchers develop collagen sequences with specific Tm values for intended materials manufacturing methods and biomedical applications, realizing a mechanistic materials by design paradigm.
Collapse
|
11
|
Khare E, Gonzalez-Obeso C, Kaplan DL, Buehler MJ. CollagenTransformer: End-to-End Transformer Model to Predict Thermal Stability of Collagen Triple Helices Using an NLP Approach. ACS Biomater Sci Eng 2022; 8:4301-4310. [PMID: 36149671 DOI: 10.1021/acsbiomaterials.2c00737] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Collagen is one of the most important structural proteins in biology, and its structural hierarchy plays a crucial role in many mechanically important biomaterials. Here, we demonstrate how transformer models can be used to predict, directly from the primary amino acid sequence, the thermal stability of collagen triple helices, measured via the melting temperature Tm. We report two distinct transformer architectures to compare performance. First, we train a small transformer model from scratch, using our collagen data set featuring only 633 sequence-to-Tm pairings. Second, we use a large pretrained transformer model, ProtBERT, and fine-tune it for a particular downstream task by utilizing sequence-to-Tm pairings, using a deep convolutional network to translate natural language processing BERT embeddings into required features. Both the small transformer model and the fine-tuned ProtBERT model have similar R2 values of test data (R2 = 0.84 vs 0.79, respectively), but the ProtBERT is a much larger pretrained model that may not always be applicable for other biological or biomaterials questions. Specifically, we show that the small transformer model requires only 0.026% of the number of parameters compared to the much larger model but reaches almost the same accuracy for the test set. We compare the performance of both models against 71 newly published sequences for which Tm has been obtained as a validation set and find reasonable agreement, with ProtBERT outperforming the small transformer model. The results presented here are, to our best knowledge, the first demonstration of the use of transformer models for relatively small data sets and for the prediction of specific biophysical properties of interest. We anticipate that the work presented here serves as a starting point for transformer models to be applied to other biophysical problems.
Collapse
Affiliation(s)
- Eesha Khare
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, United States.,Department of Materials Science and Engineering, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, United States
| | | | - David L Kaplan
- Tufts University, Medford, Massachusetts 02155, United States
| | - Markus J Buehler
- Laboratory for Atomistic and Molecular Mechanics (LAMM), Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, United States.,Center for Computational Science and Engineering, Schwarzman College of Computing, Massachusetts Institute of Technology, 77 Massachusetts Ave., Cambridge, Massachusetts 02139, United States
| |
Collapse
|