1
|
Vogel G, Weber JM. Inverse design of copolymers including stoichiometry and chain architecture. Chem Sci 2025; 16:1161-1178. [PMID: 39697419 PMCID: PMC11650379 DOI: 10.1039/d4sc05900j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 12/03/2024] [Indexed: 12/20/2024] Open
Abstract
The demand for innovative synthetic polymers with improved properties is high, but their structural complexity and vast design space hinder rapid discovery. Machine learning-guided molecular design is a promising approach to accelerate polymer discovery. However, the scarcity of labeled polymer data and the complex hierarchical structure of synthetic polymers make generative design particularly challenging. We advance the current state-of-the-art approaches to generate not only repeating units, but monomer ensembles including their stoichiometry and chain architecture. We build upon a recent polymer representation that includes stoichiometries and chain architectures of monomer ensembles and develop a novel variational autoencoder (VAE) architecture encoding a graph and decoding a string. Using a semi-supervised setup, we enable the handling of partly labelled datasets which can be beneficial for domains with a small corpus of labelled data. Our model learns a continuous, well organized latent space (LS) that enables de novo generation of copolymer structures including different monomer stoichiometries and chain architectures. In an inverse design case study, we demonstrate our model for in silico discovery of novel conjugated copolymer photocatalysts for hydrogen production using optimization of the polymer's electron affinity and ionization potential in the latent space.
Collapse
Affiliation(s)
- Gabriel Vogel
- Department of Intelligent Systems, Delft University of Technology Delft 2629 HZ The Netherlands
| | - Jana M Weber
- Department of Intelligent Systems, Delft University of Technology Delft 2629 HZ The Netherlands
| |
Collapse
|
2
|
Petersen SR, Kohan Marzagão D, Gregory GL, Huang Y, Clifton DA, Williams CK, Siviour CR. Property Prediction of Bio-Derived Block Copolymer Thermoplastic Elastomers Using Graph Kernel Methods. Angew Chem Int Ed Engl 2025; 64:e202411097. [PMID: 39612309 DOI: 10.1002/anie.202411097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 10/25/2024] [Accepted: 11/29/2024] [Indexed: 12/01/2024]
Abstract
Increasing the diversity of bio-based polymers is needed to address the combined problems of plastic pollution and greenhouse gas emissions. The magnitude of the problems necessitates rapid discovery of new materials; however, identification of appropriate chemistries maybe slow using current iterative methods. Machine learning (ML) methods could significantly expedite new material discovery and property identification. Here, PolyAGM, a ML algorithm using graph kernel methods, is introduced and used to predict the properties of block copolymers and identify the responsible structural 'motifs'. It applies a "fingerprinting" method to convert Graph representations of polymers into numerical vectors. The Graphs explicitly encode the entire copolymer of atoms and bonds such that the sequencing of chemical features and polymer chain length are included, alongside relevant stereochemical information. PolyAGM gives predictions for both thermal and mechanical properties that are in good agreement with experimental measurements. This work focuses on predicting the properties of bio-derived ABA-block polymer thermoplastic elastomers, but the general fingerprinting technique of PolyAGM should be relevant to other application fields.
Collapse
Affiliation(s)
- Shannon R Petersen
- Department of Chemistry, University of Oxford, Mansfield Rd, Oxford, OX1 3TA, UK
| | - David Kohan Marzagão
- Department of Engineering Science, University of Oxford, Parks Road, Oxford, OX1 3PJ, UK
| | - Georgina L Gregory
- Department of Chemistry, University of Oxford, Mansfield Rd, Oxford, OX1 3TA, UK
| | - Yichen Huang
- Department of Computer Science, University of Oxford, 7 Parks Road, Oxford, OX1 3QG, UK
| | - David A Clifton
- Department of Engineering Science, University of Oxford, Parks Road, Oxford, OX1 3PJ, UK
| | - Charlotte K Williams
- Department of Chemistry, University of Oxford, Mansfield Rd, Oxford, OX1 3TA, UK
| | - Clive R Siviour
- Department of Engineering Science, University of Oxford, Parks Road, Oxford, OX1 3PJ, UK
| |
Collapse
|
3
|
Ethier J, Antoniuk ER, Brettmann B. Predicting polymer solubility from phase diagrams to compatibility: a perspective on challenges and opportunities. SOFT MATTER 2024; 20:5652-5669. [PMID: 38995233 DOI: 10.1039/d4sm00590b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
Polymer processing, purification, and self-assembly have significant roles in the design of polymeric materials. Understanding how polymers behave in solution (e.g., their solubility, chemical properties, etc.) can improve our control over material properties via their processing-structure-property relationships. For many decades the polymer science community has relied on thermodynamic and physics-based models to aid in this endeavor, but all rely on disparate data sets and use-case scenarios. Hence, there are still significant challenges to predict a priori the solubility of a polymer, whether it is for selecting sustainable solvents, obtaining thermodynamic parameters for phase separation, or navigating the coexistence curve. This perspective aims to discuss the different approaches of applying computational tools to predict polymer solubility, with a significant focus on machine learning techniques to capture the rapid progress in that space. We examine challenges and opportunities that remain for creating a comprehensive solubility toolset that can accelerate the design of a broad range of applications including films, membranes, and pharmaceuticals.
Collapse
Affiliation(s)
- Jeffrey Ethier
- Materials and Manufacturing Directorate, Air Force Research Laboratory, Wright-Patterson AFB, Ohio 45433, USA
| | - Evan R Antoniuk
- Materials Science Division, Physical and Life Sciences Directorate, Lawrence Livermore National Laboratory, Livermore, California 94550, USA
| | - Blair Brettmann
- Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA.
- Materials Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332, USA
| |
Collapse
|
4
|
Shi J, Walsh D, Zou W, Rebello NJ, Deagen ME, Fransen KA, Gao X, Olsen BD, Audus DJ. Calculating Pairwise Similarity of Polymer Ensembles via Earth Mover's Distance. ACS POLYMERS AU 2024; 4:66-76. [PMID: 38371731 PMCID: PMC10870752 DOI: 10.1021/acspolymersau.3c00029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/28/2023] [Accepted: 11/29/2023] [Indexed: 02/20/2024]
Abstract
Synthetic polymers, in contrast to small molecules and deterministic biomacromolecules, are typically ensembles composed of polymer chains with varying numbers, lengths, sequences, chemistry, and topologies. While numerous approaches exist for measuring pairwise similarity among small molecules and sequence-defined biomacromolecules, accurately determining the pairwise similarity between two polymer ensembles remains challenging. This work proposes the earth mover's distance (EMD) metric to calculate the pairwise similarity score between two polymer ensembles. EMD offers a greater resolution of chemical differences between polymer ensembles than the averaging method and provides a quantitative numeric value representing the pairwise similarity between polymer ensembles in alignment with chemical intuition. The EMD approach for assessing polymer similarity enhances the development of accurate chemical search algorithms within polymer databases and can improve machine learning techniques for polymer design, optimization, and property prediction.
Collapse
Affiliation(s)
- Jiale Shi
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Dylan Walsh
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Weizhong Zou
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Nathan J. Rebello
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Michael E. Deagen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Katharina A. Fransen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Xian Gao
- Department
of Chemical and Biomolecular Engineering, University of Notre Dame, Notre
Dame, Indiana 46556, United States
| | - Bradley D. Olsen
- Department
of Chemical Engineering, Massachusetts Institute
of Technology, Cambridge, Massachusetts 02139, United States
| | - Debra J. Audus
- Materials
Science and Engineering Division, National
Institute of Standards and Technology, Gaithersburg, Maryland 20899, United States
| |
Collapse
|
5
|
Qiu H, Liu L, Qiu X, Dai X, Ji X, Sun ZY. PolyNC: a natural and chemical language model for the prediction of unified polymer properties. Chem Sci 2024; 15:534-544. [PMID: 38179518 PMCID: PMC10763023 DOI: 10.1039/d3sc05079c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 12/04/2023] [Indexed: 01/06/2024] Open
Abstract
Language models exhibit a profound aptitude for addressing multimodal and multidomain challenges, a competency that eludes the majority of off-the-shelf machine learning models. Consequently, language models hold great potential for comprehending the intricate interplay between material compositions and diverse properties, thereby accelerating material design, particularly in the realm of polymers. While past limitations in polymer data hindered the use of data-intensive language models, the growing availability of standardized polymer data and effective data augmentation techniques now opens doors to previously uncharted territories. Here, we present a revolutionary model to enable rapid and precise prediction of Polymer properties via the power of Natural language and Chemical language (PolyNC). To showcase the efficacy of PolyNC, we have meticulously curated a labeled prompt-structure-property corpus encompassing 22 970 polymer data points on a series of essential polymer properties. Through the use of natural language prompts, PolyNC gains a comprehensive understanding of polymer properties, while employing chemical language (SMILES) to describe polymer structures. In a unified text-to-text manner, PolyNC consistently demonstrates exceptional performance on both regression tasks (such as property prediction) and the classification task (polymer classification). Simultaneous and interactive multitask learning enables PolyNC to holistically grasp the structure-property relationships of polymers. Through a combination of experiments and characterizations, the generalization ability of PolyNC has been demonstrated, with attention analysis further indicating that PolyNC effectively learns structural information about polymers from multimodal inputs. This work provides compelling evidence of the potential for deploying end-to-end language models in polymer research, representing a significant advancement in the AI community's dedicated pursuit of advancing polymer science.
Collapse
Affiliation(s)
- Haoke Qiu
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
- School of Applied Chemistry and Engineering, University of Science and Technology of China Hefei 230026 China
| | - Lunyang Liu
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
| | - Xuepeng Qiu
- School of Applied Chemistry and Engineering, University of Science and Technology of China Hefei 230026 China
- CAS Key Laboratory of High-Performance Synthetic Rubber and its Composite Materials, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
| | - Xuemin Dai
- CAS Key Laboratory of High-Performance Synthetic Rubber and its Composite Materials, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
| | - Xiangling Ji
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
- School of Applied Chemistry and Engineering, University of Science and Technology of China Hefei 230026 China
| | - Zhao-Yan Sun
- State Key Laboratory of Polymer Physics and Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences Changchun 130022 China
- School of Applied Chemistry and Engineering, University of Science and Technology of China Hefei 230026 China
| |
Collapse
|
6
|
Malashin IP, Tynchenko VS, Nelyub VA, Borodulin AS, Gantimurov AP. Estimation and Prediction of the Polymers' Physical Characteristics Using the Machine Learning Models. Polymers (Basel) 2023; 16:115. [PMID: 38201778 PMCID: PMC10780762 DOI: 10.3390/polym16010115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2023] [Revised: 12/23/2023] [Accepted: 12/27/2023] [Indexed: 01/12/2024] Open
Abstract
This article investigates the utility of machine learning (ML) methods for predicting and analyzing the diverse physical characteristics of polymers. Leveraging a rich dataset of polymers' characteristics, the study encompasses an extensive range of polymer properties, spanning compressive and tensile strength to thermal and electrical behaviors. Using various regression methods like Ensemble, Tree-based, Regularization, and Distance-based, the research undergoes thorough evaluation using the most common quality metrics. As a result of a series of experimental studies on the selection of effective model parameters, those that provide a high-quality solution to the stated problem were found. The best results were achieved by Random Forest with the highest R2 scores of 0.71, 0.73, and 0.88 for glass transition, thermal decomposition, and melting temperatures, respectively. The outcomes are intricately compared, providing valuable insights into the efficiency of distinct ML approaches in predicting polymer properties. Unknown values for each characteristic were predicted, and a method validation was performed by training on the predicted values, comparing the results with the specified variance values of each characteristic. The research not only advances our comprehension of polymer physics but also contributes to informed model selection and optimization for materials science applications.
Collapse
Affiliation(s)
- Ivan Pavlovich Malashin
- Artificial Intelligence Technology Scientific and Education Center, Bauman Moscow State Technical University, 105005 Moscow, Russia; (V.A.N.); (A.S.B.); (A.P.G.)
| | - Vadim Sergeevich Tynchenko
- Artificial Intelligence Technology Scientific and Education Center, Bauman Moscow State Technical University, 105005 Moscow, Russia; (V.A.N.); (A.S.B.); (A.P.G.)
- Information-Control Systems Department, Institute of Computer Science and Telecommunications, Reshetnev Siberian State University of Science and Technology, 660037 Krasnoyarsk, Russia
- Department of Technological Machines and Equipment of Oil and Gas Complex, School of Petroleum and Natural Gas Engineering, Siberian Federal University, 660041 Krasnoyarsk, Russia
| | - Vladimir Aleksandrovich Nelyub
- Artificial Intelligence Technology Scientific and Education Center, Bauman Moscow State Technical University, 105005 Moscow, Russia; (V.A.N.); (A.S.B.); (A.P.G.)
| | - Aleksei Sergeevich Borodulin
- Artificial Intelligence Technology Scientific and Education Center, Bauman Moscow State Technical University, 105005 Moscow, Russia; (V.A.N.); (A.S.B.); (A.P.G.)
| | - Andrei Pavlovich Gantimurov
- Artificial Intelligence Technology Scientific and Education Center, Bauman Moscow State Technical University, 105005 Moscow, Russia; (V.A.N.); (A.S.B.); (A.P.G.)
| |
Collapse
|
7
|
Day EC, Chittari SS, Bogen MP, Knight AS. Navigating the Expansive Landscapes of Soft Materials: A User Guide for High-Throughput Workflows. ACS POLYMERS AU 2023; 3:406-427. [PMID: 38107416 PMCID: PMC10722570 DOI: 10.1021/acspolymersau.3c00025] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/02/2023] [Accepted: 11/07/2023] [Indexed: 12/19/2023]
Abstract
Synthetic polymers are highly customizable with tailored structures and functionality, yet this versatility generates challenges in the design of advanced materials due to the size and complexity of the design space. Thus, exploration and optimization of polymer properties using combinatorial libraries has become increasingly common, which requires careful selection of synthetic strategies, characterization techniques, and rapid processing workflows to obtain fundamental principles from these large data sets. Herein, we provide guidelines for strategic design of macromolecule libraries and workflows to efficiently navigate these high-dimensional design spaces. We describe synthetic methods for multiple library sizes and structures as well as characterization methods to rapidly generate data sets, including tools that can be adapted from biological workflows. We further highlight relevant insights from statistics and machine learning to aid in data featurization, representation, and analysis. This Perspective acts as a "user guide" for researchers interested in leveraging high-throughput screening toward the design of multifunctional polymers and predictive modeling of structure-property relationships in soft materials.
Collapse
Affiliation(s)
| | | | - Matthew P. Bogen
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Abigail S. Knight
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
8
|
Sanchez Medina E, Kunchapu S, Sundmacher K. Gibbs-Helmholtz Graph Neural Network for the Prediction of Activity Coefficients of Polymer Solutions at Infinite Dilution. J Phys Chem A 2023; 127:9863-9873. [PMID: 37943172 PMCID: PMC10683018 DOI: 10.1021/acs.jpca.3c05892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/18/2023] [Accepted: 10/25/2023] [Indexed: 11/10/2023]
Abstract
Machine learning models have gained prominence for predicting pure-component properties, yet their application to mixture property prediction remains relatively limited. However, the significance of mixtures in our daily lives is undeniable, particularly in industries such as polymer processing. This study presents a modification of the Gibbs-Helmholtz graph neural network (GH-GNN) model for predicting weight-based activity coefficients at infinite dilution (Ωij∞) in polymer solutions. We evaluate various polymer representations ranging from monomer, repeating unit, periodic unit, and oligomer and observe that, in data-scarce scenarios of polymer-solvent mixtures, polymer representation specifics have a reduced impact compared to data-rich environments. Leveraging transfer learning, we harness richer activity coefficient data from small-size systems, enhancing model accuracy and reducing prediction variability. The modified GH-GNN model achieves remarkable prediction results in mixture interpolation and solvent extrapolation tasks having an overall mean absolute error of 0.15, showcasing the potential of graph-neural-network-based models for property prediction of polymer solutions. Comparative analysis with the established models UNIFAC-ZM and Entropic-FV suggests a promising avenue for future research on the use of data-driven models for the prediction of the thermodynamic properties of polymer solutions.
Collapse
Affiliation(s)
- Edgar
Ivan Sanchez Medina
- Chair
for Process Systems Engineering, Otto-von-Guericke
University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Sreekanth Kunchapu
- Chair
for Process Systems Engineering, Otto-von-Guericke
University, Universitätsplatz 2, Magdeburg 39106, Germany
| | - Kai Sundmacher
- Chair
for Process Systems Engineering, Otto-von-Guericke
University, Universitätsplatz 2, Magdeburg 39106, Germany
- Process
Systems Engineering, Max Planck Institute
for Dynamics of Complex Technical Systems, Sandtorstraße 1, Magdeburg 39106, Germany
| |
Collapse
|
9
|
Hu J, Li Z, Lin J, Zhang L. Prediction and Interpretability of Glass Transition Temperature of Homopolymers by Data-Augmented Graph Convolutional Neural Networks. ACS APPLIED MATERIALS & INTERFACES 2023; 15:54006-54017. [PMID: 37934171 DOI: 10.1021/acsami.3c13698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2023]
Abstract
Establishing the structure-property relationship by machine learning (ML) models is extremely valuable for accelerating the molecular design of polymers. However, existing ML models for the polymers are subject to scarcity issues of training data and fewer variations of graph structures of molecules. In addition, limited works have explored the interpretability of ML models to infer the latent knowledge in the field of polymer science that could inspire ML-assisted molecular design. In this contribution, we integrate graph convolutional neural networks (GCNs) with data augmentation strategy to predict the glass transition temperature Tg of polymers. It is demonstrated that the data-augmented GCN model outperforms the conventional models and achieves a higher accuracy for the prediction of Tg despite a small amount of training data. Furthermore, taking advantage of molecular graph representations, the data-augmented GCN model has the capability to infer the importance of atoms or substructures from the understanding of Tg, which generally agrees with the experimental findings in the field of polymer science. The inferred knowledge of the GCN model is used to advise on the design of functional polymers with specific Tg. The data-augmented GCN model possesses prominent superiorities in the establishment of structure-property relationship and also provides an efficient way for accelerating the rational design of polymer molecules.
Collapse
Affiliation(s)
- Junyang Hu
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Zean Li
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Jiaping Lin
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liangshun Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
10
|
Rebello NJ, Lin TS, Nazeer H, Olsen BD. BigSMARTS: A Topologically Aware Query Language and Substructure Search Algorithm for Polymer Chemical Structures. J Chem Inf Model 2023; 63:6555-6568. [PMID: 37874026 DOI: 10.1021/acs.jcim.3c00978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Molecular search is important in chemistry, biology, and informatics for identifying molecular structures within large data sets, improving knowledge discovery and innovation, and making chemical data FAIR (findable, accessible, interoperable, reusable). Search algorithms for polymers are significantly less developed than those for small molecules because polymer search relies on searching by polymer name, which can be challenging because polymer naming is overly broad (i.e., polyethylene), complicated for complex chemical structures, and often does not correspond to official IUPAC conventions. Chemical structure search in polymers is limited to substructures, such as monomers, without awareness of connectivity or topology. This work introduces a novel query language and graph traversal search algorithm for polymers that provides the first search method able to fully capture all of the chemical structures present in polymers. The BigSMARTS query language, an extension of the small-molecule SMARTS language, allows users to write queries that localize monomer and functional group searches to different parts of the polymer, like the middle block of a triblock, the side chain of a graft, and the backbone of a repeat unit. The substructure search algorithm is based on the traversal of graph representations of the generating functions for the stochastic graphs of polymers. Operationally, the algorithm first identifies cycles representing the monomers and then the end groups and finally performs a depth-first search to match entire subgraphs. To validate the algorithm, hundreds of queries were searched against hundreds of target chemistries and topologies from the literature, with approximately 440,000 query-target pairs. This tool provides a detailed algorithm that can be implemented in search engines to provide search results with full matching of the monomer connectivity and polymer topology.
Collapse
Affiliation(s)
- Nathan J Rebello
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Tzyy-Shyang Lin
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Heeba Nazeer
- Department of Computer Science, Wellesley College, 106 Central Street, Wellesley, Massachusetts 02481, United States
| | - Bradley D Olsen
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
11
|
Lo S, Seifrid M, Gaudin T, Aspuru-Guzik A. Augmenting Polymer Datasets by Iterative Rearrangement. J Chem Inf Model 2023. [PMID: 37390494 DOI: 10.1021/acs.jcim.3c00144] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/02/2023]
Abstract
One of the biggest obstacles to successful polymer property prediction is an effective representation that accurately captures the sequence of repeat units in a polymer. Motivated by the success of data augmentation in computer vision and natural language processing, we explore augmenting polymer data by iteratively rearranging the molecular representation while preserving the correct connectivity, revealing additional substructural information that is not present in a single representation. We evaluate the effects of this technique on the performance of machine learning models trained on three polymer datasets and compare them to common molecular representations. Data augmentation does not yield significant improvements in machine learning property prediction performance compared to equivalent (non-augmented) representations. In datasets where the target property is primarily influenced by the polymer sequence rather than experimental parameters, this data augmentation technique provides molecular embedding with more information to improve property prediction accuracy.
Collapse
Affiliation(s)
- Stanley Lo
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
| | - Martin Seifrid
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, Ontario M5S 2E4, Canada
- IBM Research Zürich, Rüschlikon, Zürich 8803, Switzerland
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, 80 St. George Street, Toronto, Ontario M5S 3H6, Canada
- Department of Computer Science, University of Toronto, 40 St. George Street, Toronto, Ontario M5S 2E4, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, 200 College St., Toronto, Ontario M5S 3E5, Canada
- Department of Materials Science and Engineering, University of Toronto, 184 College St., Toronto, Ontario M5S 3E4, Canada
- CIFAR Artificial Intelligence Research Chair, Vector Institute, Toronto, Ontario M5S 1M1, Canada
- Canadian Institute for Advanced Research (CIFAR), Toronto, Ontario M5S 1M1, Canada
| |
Collapse
|
12
|
Park NH, Manica M, Born J, Hedrick JL, Erdmann T, Zubarev DY, Adell-Mill N, Arrechea PL. Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language. Nat Commun 2023; 14:3686. [PMID: 37344485 PMCID: PMC10284867 DOI: 10.1038/s41467-023-39396-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 06/12/2023] [Indexed: 06/23/2023] Open
Abstract
Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to accommodate a broad array of experiment and data types found in polymer science. This inflexibility presents a significant barrier for researchers to leverage their historical data in ML development. Here we show that a domain specific language, termed Chemical Markdown Language (CMDL), provides flexible, extensible, and consistent representation of disparate experiment types and polymer structures. CMDL enables seamless use of historical experimental data to fine-tune regression transformer (RT) models for generative molecular design tasks. We demonstrate the utility of this approach through the generation and the experimental validation of catalysts and polymers in the context of ring-opening polymerization-although we provide examples of how CMDL can be more broadly applied to other polymer classes. Critically, we show how the CMDL tuned model preserves key functional groups within the polymer structure, allowing for experimental validation. These results reveal the versatility of CMDL and how it facilitates translation of historical data into meaningful predictive and generative models to produce experimentally actionable output.
Collapse
Affiliation(s)
| | - Matteo Manica
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
| | - Jannis Born
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058, Basel, Switzerland
| | - James L Hedrick
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | - Tim Erdmann
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | | | - Nil Adell-Mill
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Arctoris, 120E Olympic Avenue, Abingdon, OX14 4SA, Oxfordshire, UK
| | | |
Collapse
|
13
|
Gurnani R, Kuenneth C, Toland A, Ramprasad R. Polymer Informatics at Scale with Multitask Graph Neural Networks. CHEMISTRY OF MATERIALS : A PUBLICATION OF THE AMERICAN CHEMICAL SOCIETY 2023; 35:1560-1567. [PMID: 36873627 PMCID: PMC9979603 DOI: 10.1021/acs.chemmater.2c02991] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 02/03/2023] [Indexed: 06/18/2023]
Abstract
Artificial intelligence-based methods are becoming increasingly effective at screening libraries of polymers down to a selection that is manageable for experimental inquiry. The vast majority of presently adopted approaches for polymer screening rely on handcrafted chemostructural features extracted from polymer repeat units-a burdensome task as polymer libraries, which approximate the polymer chemical search space, progressively grow over time. Here, we demonstrate that directly "machine learning" important features from a polymer repeat unit is a cheap and viable alternative to extracting expensive features by hand. Our approach-based on graph neural networks, multitask learning, and other advanced deep learning techniques-speeds up feature extraction by 1-2 orders of magnitude relative to presently adopted handcrafted methods without compromising model accuracy for a variety of polymer property prediction tasks. We anticipate that our approach, which unlocks the screening of truly massive polymer libraries at scale, will enable more sophisticated and large scale screening technologies in the field of polymer informatics.
Collapse
|