1
|
Hau HM, Holstun T, Lee E, Rinkel BLD, Mishra TP, Markuson DiPrince M, Mohanakrishnan RS, Self EC, Persson KA, McCloskey BD, Ceder G. Disordered Rocksalts as High-Energy and Earth-Abundant Li-Ion Cathodes. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2025:e2502766. [PMID: 40326162 DOI: 10.1002/adma.202502766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/10/2025] [Revised: 04/09/2025] [Indexed: 05/07/2025]
Abstract
To address the growing demand for energy and support the shift toward transportation electrification and intermittent renewable energy, there is an urgent need for low-cost, energy-dense electrical storage. Research on Li-ion electrode materials has predominantly focused on ordered materials with well-defined lithium diffusion channels, limiting cathode design to resource-constrained Ni- and Co-based oxides and lower-energy polyanion compounds. Recently, disordered rocksalts with lithium excess (DRX) have demonstrated high capacity and energy density when lithium excess and/or local ordering allow statistical percolation of lithium sites through the structure. This cation disorder can be induced by high temperature synthesis or mechanochemical synthesis methods for a broad range of compositions. DRX oxides and oxyfluorides containing Earth-abundant transition metals have been prepared using various synthesis routes, including solid-state, molten-salt, and sol-gel reactions. This review outlines DRX design principles and explains the effect of synthesis conditions on cation disorder and short-range cation ordering (SRO), which determines the cycling stability and rate capability. In addition, strategies to enhance Li transport and capacity retention with Mn-rich DRX possessing partial spinel-like ordering are discussed. Finally, the review considers the optimization of carbon and electrolyte in DRX materials and addresses key challenges and opportunities for commercializing DRX cathodes.
Collapse
Affiliation(s)
- Han-Ming Hau
- Department of Materials Science and Engineering, University of California Berkeley, Berkeley, CA, 94720, USA
- Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Tucker Holstun
- Department of Materials Science and Engineering, University of California Berkeley, Berkeley, CA, 94720, USA
- Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Eunryeol Lee
- Department of Materials Science and Engineering, University of California Berkeley, Berkeley, CA, 94720, USA
- Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Bernardine L D Rinkel
- Department of Chemical and Biomolecular Engineering, University of California-Berkeley, Berkeley, CA, 94720, USA
- Energy and Distributed Resources Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Tara P Mishra
- Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Max Markuson DiPrince
- Bredesen Center for Interdisciplinary Research and Education, University of Tennessee Knoxville, Knoxville, TN, 37996, USA
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
| | - Rohith Srinivaas Mohanakrishnan
- Department of Materials Science and Engineering, University of California Berkeley, Berkeley, CA, 94720, USA
- Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Ethan C Self
- Chemical Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37830, USA
| | - Kristin A Persson
- Department of Materials Science and Engineering, University of California Berkeley, Berkeley, CA, 94720, USA
- Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Bryan D McCloskey
- Department of Chemical and Biomolecular Engineering, University of California-Berkeley, Berkeley, CA, 94720, USA
- Energy and Distributed Resources Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - Gerbrand Ceder
- Department of Materials Science and Engineering, University of California Berkeley, Berkeley, CA, 94720, USA
- Materials Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| |
Collapse
|
2
|
Canty RB, Bennett JA, Brown KA, Buonassisi T, Kalinin SV, Kitchin JR, Maruyama B, Moore RG, Schrier J, Seifrid M, Sun S, Vegge T, Abolhasani M. Science acceleration and accessibility with self-driving labs. Nat Commun 2025; 16:3856. [PMID: 40274856 PMCID: PMC12022019 DOI: 10.1038/s41467-025-59231-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Accepted: 04/08/2025] [Indexed: 04/26/2025] Open
Abstract
In the evolving landscape of scientific research, the complexity of global challenges demands innovative approaches to experimental planning and execution. Self-Driving Laboratories (SDLs) automate experimental tasks in chemical and materials sciences and the design and selection of experiments to optimize research processes and reduce material usage. This perspective explores improving access to SDLs via centralized facilities and distributed networks. We discuss the technical and collaborative challenges in realizing SDLs' potential to enhance human-machine and human-human collaboration, ultimately fostering a more inclusive research community and facilitating previously untenable research projects.
Collapse
Affiliation(s)
- Richard B Canty
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
| | - Jeffrey A Bennett
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA
| | - Keith A Brown
- Department of Mechanical Engineering, Boston University, Boston, MA, USA
| | - Tonio Buonassisi
- Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Sergei V Kalinin
- Materials Science and Engineering, The University of Tennessee, Knoxville, TN, USA
| | - John R Kitchin
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Benji Maruyama
- Air Force Research Laboratory, Materials and Manufacturing Directorate, Wright-Patterson AFB, OH, USA
| | - Robert G Moore
- Materials Science and Technology Division, Oak Ridge National Laboratory, Oak Ridge, TN, USA
| | - Joshua Schrier
- Department of Chemistry and Biochemistry, Fordham University, New York, NY, USA
| | - Martin Seifrid
- Department of Materials Science and Engineering, North Carolina State University, Raleigh, NC, USA
| | - Shijing Sun
- Department of Mechanical Engineering, University of Washington, Seattle, WA, USA
| | - Tejs Vegge
- Department of Energy Conversion and Storage, Technical University of Denmark, Lyngby, Denmark
| | - Milad Abolhasani
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, NC, USA.
| |
Collapse
|
3
|
Song T, Luo M, Zhang X, Chen L, Huang Y, Cao J, Zhu Q, Liu D, Zhang B, Zou G, Zhang G, Zhang F, Shang W, Fu Y, Jiang J, Luo Y. A Multiagent-Driven Robotic AI Chemist Enabling Autonomous Chemical Research On Demand. J Am Chem Soc 2025; 147:12534-12545. [PMID: 40056128 DOI: 10.1021/jacs.4c17738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/10/2025]
Abstract
The successful integration of large language models (LLMs) into laboratory workflows has demonstrated robust capabilities in natural language processing, autonomous task execution, and collaborative problem-solving. This offers an exciting opportunity to realize the dream of autonomous chemical research on demand. Here, we report a robotic AI chemist powered by a hierarchical multiagent system, ChemAgents, based on an on-board Llama-3.1-70B LLM, capable of executing complex, multistep experiments with minimal human intervention. It operates through a Task Manager agent that interacts with human researchers and coordinates four role-specific agents─Literature Reader, Experiment Designer, Computation Performer, and Robot Operator─each leveraging one of four foundational resources: a comprehensive Literature Database, an extensive Protocol Library, a versatile Model Library, and a state-of-the-art Automated Lab. We demonstrate its versatility and efficacy through six experimental tasks of varying complexity, ranging from straightforward synthesis and characterization to more complex exploration and screening of experimental parameters, culminating in the discovery and optimization of functional materials. Additionally, we introduce a seventh task, where ChemAgents is deployed in a new robotic chemistry lab environment to autonomously perform photocatalytic organic reactions, highlighting ChemAgents's scalability and adaptability. Our multiagent-driven robotic AI chemist showcases the potential of on-demand autonomous chemical research to accelerate discovery and democratize access to advanced experimental capabilities across academic disciplines and industries.
Collapse
Affiliation(s)
- Tao Song
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Man Luo
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Xiaolong Zhang
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Linjiang Chen
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- School of Chemistry, School of Computer Science, University of Birmingham, Birmingham B15 2TT, U.K
| | - Yan Huang
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Jiaqi Cao
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Qing Zhu
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- Institute of Intelligent Innovation, Henan Academy of Sciences, Zhengzhou 451162, China
| | - Daobin Liu
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Baicheng Zhang
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Gang Zou
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Guoqing Zhang
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Fei Zhang
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Weiwei Shang
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Yao Fu
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- CAS Key Laboratory of Urban Pollutant Conversion, Anhui Province Key Laboratory of Biomass Chemistry, University of Science and Technology of China, Hefei 230026, China
| | - Jun Jiang
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- Hefei National Laboratory, University of Science and Technology of China, Hefei 230026, China
| | - Yi Luo
- State Key Laboratory of Precision and Intelligent Chemistry, Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- Hefei National Laboratory, University of Science and Technology of China, Hefei 230026, China
| |
Collapse
|
4
|
Wang T, Qin BR, Li S, Wang Z, Li X, Jiang Y, Qin C, Ouyang Q, Lou C, Qian L. Discovery of diverse and high-quality mRNA capping enzymes through a language model-enabled platform. SCIENCE ADVANCES 2025; 11:eadt0402. [PMID: 40203090 PMCID: PMC11980835 DOI: 10.1126/sciadv.adt0402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Accepted: 03/04/2025] [Indexed: 04/11/2025]
Abstract
Mining and expanding high-quality genetic parts for synthetic biology and bioengineering are urgent needs in the research and development of next-generation biotechnology. However, gene mining has relied on sequence homology or ample expert knowledge, which fundamentally limits the establishment of a comprehensive genetic part catalog. In this work, we propose SYMPLEX (synthetic biological part mining platform by large language model-enabled knowledge extraction), a universal gene-mining platform based on large language models. We applied SYMPLEX to mine enzymes responsible for messenger RNA (mRNA) capping, a key process in eukaryotic posttranscriptional modification, and obtained thousands of diverse candidates with traceable evidence from biomedical literature and databases. Of the 46 experimentally tested integral capping enzyme candidates, 14 demonstrated in vivo cross-species capping activity, and 2 displayed superior in vitro activity over the commercial vaccinia capping enzymes currently used in mRNA vaccine production. SYMPLEX provides a distinct paradigm for functional gene mining and offers powerful tools to facilitate knowledge discovery in fundamental research.
Collapse
Affiliation(s)
- Tianze Wang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Bowen R. Qin
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Sihong Li
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Zimo Wang
- Center for Cell and Gene Circuit Design, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Xuejian Li
- Beyond Flux Technology Co. Ltd., Beijing 100000, China
| | - Yuanxu Jiang
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Chenrui Qin
- Institute for Advanced Study in Physics, Zhejiang University, Hangzhou 310058, China
| | - Qi Ouyang
- Institute for Advanced Study in Physics, Zhejiang University, Hangzhou 310058, China
| | - Chunbo Lou
- Center for Cell and Gene Circuit Design, Key Laboratory of Quantitative Synthetic Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Long Qian
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| |
Collapse
|
5
|
Schilling-Wilhelmi M, Ríos-García M, Shabih S, Gil MV, Miret S, Koch CT, Márquez JA, Jablonka KM. From text to insight: large language models for chemical data extraction. Chem Soc Rev 2025; 54:1125-1150. [PMID: 39703015 DOI: 10.1039/d4cs00913d] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
The vast majority of chemical knowledge exists in unstructured natural language, yet structured data is crucial for innovative and systematic materials design. Traditionally, the field has relied on manual curation and partial automation for data extraction for specific use cases. The advent of large language models (LLMs) represents a significant shift, potentially enabling non-experts to extract structured, actionable data from unstructured text efficiently. While applying LLMs to chemical and materials science data extraction presents unique challenges, domain knowledge offers opportunities to guide and validate LLM outputs. This tutorial review provides a comprehensive overview of LLM-based structured data extraction in chemistry, synthesizing current knowledge and outlining future directions. We address the lack of standardized guidelines and present frameworks for leveraging the synergy between LLMs and chemical expertise. This work serves as a foundational resource for researchers aiming to harness LLMs for data-driven chemical research. The insights presented here could significantly enhance how researchers across chemical disciplines access and utilize scientific information, potentially accelerating the development of novel compounds and materials for critical societal needs.
Collapse
Affiliation(s)
- Mara Schilling-Wilhelmi
- Laboratory of Organic and Macromolecular Chemistry (IOMC), Friedrich Schiller University Jena, Humboldtstrasse 10, 07743 Jena, Germany.
| | - Martiño Ríos-García
- Laboratory of Organic and Macromolecular Chemistry (IOMC), Friedrich Schiller University Jena, Humboldtstrasse 10, 07743 Jena, Germany.
- Institute of Carbon Science and Technology (INCAR), CSIC, Francisco Pintado Fe 26, 33011 Oviedo, Spain
| | - Sherjeel Shabih
- Department of Physics and CSMB, Humboldt-Universität zu Berlin, Berlin, Germany
| | - María Victoria Gil
- Institute of Carbon Science and Technology (INCAR), CSIC, Francisco Pintado Fe 26, 33011 Oviedo, Spain
| | | | - Christoph T Koch
- Department of Physics and CSMB, Humboldt-Universität zu Berlin, Berlin, Germany
| | - José A Márquez
- Department of Physics and CSMB, Humboldt-Universität zu Berlin, Berlin, Germany
| | - Kevin Maik Jablonka
- Laboratory of Organic and Macromolecular Chemistry (IOMC), Friedrich Schiller University Jena, Humboldtstrasse 10, 07743 Jena, Germany.
- Center for Energy and Environmental Chemistry Jena (CEEC Jena), Friedrich Schiller University Jena, Philosophenweg 7a, 07743 Jena, Germany
- Helmholtz Institute for Polymers in Energy Applications Jena (HIPOLE Jena), Lessingstrasse 12-14, 07743 Jena, Germany
| |
Collapse
|
6
|
Šiaučiulis M, Knittl-Frank C, M Mehr SH, Clarke E, Cronin L. Reaction blueprints and logical control flow for parallelized chiral synthesis in the Chemputer. Nat Commun 2024; 15:10261. [PMID: 39592595 PMCID: PMC11599859 DOI: 10.1038/s41467-024-54238-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 11/06/2024] [Indexed: 11/28/2024] Open
Abstract
Despite recent proliferation of programmable robotic chemistry hardware, current chemical programming ontologies lack essential structured programming constructs like variables, functions, and loops. Herein we present an integration of these concepts into χDL, a universal high-level chemical programming language executable in the Chemputer. To achieve this, we introduce reaction blueprints as a chemical analog to functions in computer science, allowing to apply sets of synthesis operations to different reagents and conditions. We further expand χDL with logical operation queues and iteration via pattern matching. The combination of these new features allows encoding of chemical syntheses in generalized, reproducible, and parallelized digital workflows rather than opaque and entangled single-step operations. This is showcased by synthesizing chiral diarylprolinol catalysts and subsequently utilizing them in various synthetic transformations (13 separate automated runs affording 3 organocatalysts and 12 distinct enantioenriched products in 42-97% yield, up to > 99:1 er), including automated catalyst recycling and reuse.
Collapse
Affiliation(s)
| | | | - S Hessam M Mehr
- Advanced Research Centre, University of Glasgow, 11 Chapel Lane, Glasgow, UK
| | - Emma Clarke
- Advanced Research Centre, University of Glasgow, 11 Chapel Lane, Glasgow, UK
| | - Leroy Cronin
- Advanced Research Centre, University of Glasgow, 11 Chapel Lane, Glasgow, UK.
| |
Collapse
|
7
|
He D, Jiang Y, Guillén-Soler M, Geary Z, Vizcaíno-Anaya L, Salley D, Gimenez-Lopez MDC, Long DL, Cronin L. Algorithm-Driven Robotic Discovery of Polyoxometalate-Scaffolding Metal-Organic Frameworks. J Am Chem Soc 2024; 146:28952-28960. [PMID: 39382313 PMCID: PMC11503775 DOI: 10.1021/jacs.4c09553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 09/09/2024] [Accepted: 09/20/2024] [Indexed: 10/10/2024]
Abstract
The experimental exploration of the chemical space of crystalline materials, especially metal-organic frameworks (MOFs), requires multiparameter control of a large set of reactions, which is unavoidably time-consuming and labor-intensive when performed manually. To accelerate the rate of material discovery while maintaining high reproducibility, we developed a machine learning algorithm integrated with a robotic synthesis platform for closed-loop exploration of the chemical space for polyoxometalate-scaffolding metal-organic frameworks (POMOFs). The eXtreme Gradient Boosting (XGBoost) model was optimized by using updating data obtained from the uncertainty feedback experiments and a multiclass classification extension based on the POMOF classification from their chemical constitution. The digital signatures for the robotic synthesis of POMOFs were represented by the universal chemical description language (χDL) to precisely record the synthetic steps and enhance the reproducibility. Nine novel POMOFs including one with mixed ligands derived from individual ligands through the imidization reaction of POM amine derivatives with various aldehydes have been discovered with a good repeatability. In addition, chemical space maps were plotted based on the XGBoost models whose F1 scores are above 0.8. Furthermore, the electrochemical properties of the synthesized POMOFs indicate superior electron transfer compared to the molecular POMs and the direct effect of the ratio of Zn, the type of ligands used, and the topology structures in POMOFs for modulating electron transfer abilities.
Collapse
Affiliation(s)
- Donglin He
- School
of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, United Kingdom
| | - Yibin Jiang
- School
of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, United Kingdom
| | - Melanie Guillén-Soler
- School
of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, United Kingdom
| | - Zack Geary
- School
of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, United Kingdom
| | - Lucia Vizcaíno-Anaya
- School
of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, United Kingdom
- Centro
Singular de Investigación en Química Biolóxica
e Materiais Moleculares (CiQUS), Universidade
de Santiago de Compostela, Santiago
de Compostela 15782, Spain
| | - Daniel Salley
- School
of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, United Kingdom
| | - Maria Del Carmen Gimenez-Lopez
- Centro
Singular de Investigación en Química Biolóxica
e Materiais Moleculares (CiQUS), Universidade
de Santiago de Compostela, Santiago
de Compostela 15782, Spain
| | - De-Liang Long
- School
of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, United Kingdom
| | - Leroy Cronin
- School
of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, United Kingdom
| |
Collapse
|
8
|
Lu JM, Wang HF, Guo QH, Wang JW, Li TT, Chen KX, Zhang MT, Chen JB, Shi QN, Huang Y, Shi SW, Chen GY, Pan JZ, Lu Z, Fang Q. Roboticized AI-assisted microfluidic photocatalytic synthesis and screening up to 10,000 reactions per day. Nat Commun 2024; 15:8826. [PMID: 39396057 PMCID: PMC11470948 DOI: 10.1038/s41467-024-53204-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 10/04/2024] [Indexed: 10/14/2024] Open
Abstract
The current throughput of conventional organic chemical synthesis is usually a few experiments for each operator per day. We develop a robotic system for ultra-high-throughput chemical synthesis, online characterization, and large-scale condition screening of photocatalytic reactions, based on the liquid-core waveguide, microfluidic liquid-handling, and artificial intelligence techniques. The system is capable of performing automated reactant mixture preparation, changing, introduction, ultra-fast photocatalytic reactions in seconds, online spectroscopic detection of the reaction product, and screening of different reaction conditions. We apply the system in large-scale screening of 12,000 reaction conditions of a photocatalytic [2 + 2] cycloaddition reaction including multiple continuous and discrete variables, reaching an ultra-high throughput up to 10,000 reaction conditions per day. Based on the data, AI-assisted cross-substrate/photocatalyst prediction is conducted.
Collapse
Affiliation(s)
- Jia-Min Lu
- Department of Chemistry, Zhejiang University, Hangzhou, China
- Institute of Intelligent Chemical Manufacturing and iChemFoundry Platform, Engineering Research Center of Functional Materials Intelligent Manufacturing of Zhejiang Province, ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China
| | - Hui-Feng Wang
- Department of Chemistry, Zhejiang University, Hangzhou, China
- Institute of Intelligent Chemical Manufacturing and iChemFoundry Platform, Engineering Research Center of Functional Materials Intelligent Manufacturing of Zhejiang Province, ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China
| | - Qi-Hang Guo
- Department of Chemistry, Zhejiang University, Hangzhou, China
- Institute of Intelligent Chemical Manufacturing and iChemFoundry Platform, Engineering Research Center of Functional Materials Intelligent Manufacturing of Zhejiang Province, ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China
- Center of Chemistry for Frontier Technologies, Department of Chemistry, Zhejiang University, Hangzhou, China
| | - Jian-Wei Wang
- Institute of Intelligent Chemical Manufacturing and iChemFoundry Platform, Engineering Research Center of Functional Materials Intelligent Manufacturing of Zhejiang Province, ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China
| | - Tong-Tong Li
- Department of Chemistry, Zhejiang University, Hangzhou, China
- Center of Chemistry for Frontier Technologies, Department of Chemistry, Zhejiang University, Hangzhou, China
| | - Ke-Xin Chen
- The Research Center for Life Sciences Computing, Zhejiang Lab, Hangzhou, China
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, New Territories, Hong Kong, China
| | - Meng-Ting Zhang
- Department of Chemistry, Zhejiang University, Hangzhou, China
| | - Jian-Bo Chen
- Department of Chemistry, Zhejiang University, Hangzhou, China
| | - Qian-Nuan Shi
- Institute of Intelligent Chemical Manufacturing and iChemFoundry Platform, Engineering Research Center of Functional Materials Intelligent Manufacturing of Zhejiang Province, ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China
| | - Yi Huang
- Institute of Intelligent Chemical Manufacturing and iChemFoundry Platform, Engineering Research Center of Functional Materials Intelligent Manufacturing of Zhejiang Province, ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China
| | - Shao-Wen Shi
- Institute of Intelligent Chemical Manufacturing and iChemFoundry Platform, Engineering Research Center of Functional Materials Intelligent Manufacturing of Zhejiang Province, ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China
| | - Guang-Yong Chen
- The Research Center for Life Sciences Computing, Zhejiang Lab, Hangzhou, China.
| | - Jian-Zhang Pan
- Department of Chemistry, Zhejiang University, Hangzhou, China.
- Institute of Intelligent Chemical Manufacturing and iChemFoundry Platform, Engineering Research Center of Functional Materials Intelligent Manufacturing of Zhejiang Province, ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China.
| | - Zhan Lu
- Department of Chemistry, Zhejiang University, Hangzhou, China.
- Center of Chemistry for Frontier Technologies, Department of Chemistry, Zhejiang University, Hangzhou, China.
| | - Qun Fang
- Department of Chemistry, Zhejiang University, Hangzhou, China.
- Institute of Intelligent Chemical Manufacturing and iChemFoundry Platform, Engineering Research Center of Functional Materials Intelligent Manufacturing of Zhejiang Province, ZJU-Hangzhou Global Scientific and Technological Innovation Center, Hangzhou, China.
- Key Laboratory of Excited-State Materials of Zhejiang Province, Zhejiang University, Hangzhou, China.
| |
Collapse
|
9
|
Leong SX, Pablo-García S, Zhang Z, Aspuru-Guzik A. Automated electrosynthesis reaction mining with multimodal large language models (MLLMs). Chem Sci 2024; 15:d4sc04630g. [PMID: 39397816 PMCID: PMC11462585 DOI: 10.1039/d4sc04630g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 09/13/2024] [Indexed: 10/15/2024] Open
Abstract
Leveraging the chemical data available in legacy formats such as publications and patents is a significant challenge for the community. Automated reaction mining offers a promising solution to unleash this knowledge into a learnable digital form and therefore help expedite materials and reaction discovery. However, existing reaction mining toolkits are limited to single input modalities (text or images) and cannot effectively integrate heterogeneous data that is scattered across text, tables, and figures. In this work, we go beyond single input modalities and explore multimodal large language models (MLLMs) for the analysis of diverse data inputs for automated electrosynthesis reaction mining. We compiled a test dataset of 65 articles (MERMES-T24 set) and employed it to benchmark five prominent MLLMs against two critical tasks: (i) reaction diagram parsing and (ii) resolving cross-modality data interdependencies. The frontrunner MLLM achieved ≥96% accuracy in both tasks, with the strategic integration of single-shot visual prompts and image pre-processing techniques. We integrate this capability into a toolkit named MERMES (multimodal reaction mining pipeline for electrosynthesis). Our toolkit functions as an end-to-end MLLM-powered pipeline that integrates article retrieval, information extraction and multimodal analysis for streamlining and automating knowledge extraction. This work lays the groundwork for the increased utilization of MLLMs to accelerate the digitization of chemistry knowledge for data-driven research.
Collapse
Affiliation(s)
- Shi Xuan Leong
- Department of Chemistry, University of Toronto, Lash Miller Chemical Laboratories 80 St. George Street ON M5S 3H6 Toronto Canada
- Division of Chemistry and Biological Chemistry, School of Chemistry, Chemical Engineering and Biotechnology, Nanyang Technological University 21 Nanyang Link Singapore 637371
| | - Sergio Pablo-García
- Department of Chemistry, University of Toronto, Lash Miller Chemical Laboratories 80 St. George Street ON M5S 3H6 Toronto Canada
- Department of Computer Science, University of Toronto Sandford Fleming Building, 10 King's College Road ON M5S 3G4 Toronto Canada
- Vector Institute for Artificial Intelligence 661 University Ave. Suite 710 ON M5G 1M1 Toronto Canada
| | - Zijian Zhang
- Department of Computer Science, University of Toronto Sandford Fleming Building, 10 King's College Road ON M5S 3G4 Toronto Canada
- Vector Institute for Artificial Intelligence 661 University Ave. Suite 710 ON M5G 1M1 Toronto Canada
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, Lash Miller Chemical Laboratories 80 St. George Street ON M5S 3H6 Toronto Canada
- Department of Computer Science, University of Toronto Sandford Fleming Building, 10 King's College Road ON M5S 3G4 Toronto Canada
- Vector Institute for Artificial Intelligence 661 University Ave. Suite 710 ON M5G 1M1 Toronto Canada
- Acceleration Consortium 80 St. George St. M5S 3H6 Toronto Canada
- Department of Materials Science & Engineering, University of Toronto 184 College St. M5S 3E4 Toronto Canada
- Department of Chemical Engineering & Applied Chemistry, University of Toronto 200 College St. ON M5S 3E5 Toronto Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR) 661 University Ave. M5G 1M1 Toronto Canada
| |
Collapse
|
10
|
Ai Q, Meng F, Shi J, Pelkie B, Coley CW. Extracting structured data from organic synthesis procedures using a fine-tuned large language model. DIGITAL DISCOVERY 2024; 3:1822-1831. [PMID: 39157760 PMCID: PMC11322921 DOI: 10.1039/d4dd00091a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Accepted: 07/30/2024] [Indexed: 08/20/2024]
Abstract
The popularity of data-driven approaches and machine learning (ML) techniques in the field of organic chemistry and its various subfields has increased the value of structured reaction data. Most data in chemistry is represented by unstructured text, and despite the vastness of the organic chemistry literature (papers, patents), manual conversion from unstructured text to structured data remains a largely manual endeavor. Software tools for this task would facilitate downstream applications such as reaction prediction and condition recommendation. In this study, we fine-tune a large language model (LLM) to extract reaction information from organic synthesis procedure text into structured data following the Open Reaction Database (ORD) schema, a comprehensive data structure designed for organic reactions. The fine-tuned model produces syntactically correct ORD records with an average accuracy of 91.25% for ORD "messages" (e.g., full compound, workups, or condition definitions) and 92.25% for individual data fields (e.g., compound identifiers, mass quantities), with the ability to recognize compound-referencing tokens and to infer reaction roles. We investigate its failure modes and evaluate performance on specific subtasks such as reaction role classification.
Collapse
Affiliation(s)
- Qianxiang Ai
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA USA
| | - Fanwang Meng
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA USA
| | - Jiale Shi
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA USA
| | - Brenden Pelkie
- Department of Chemical Engineering, University of Washington Seattle WA USA
| | - Connor W Coley
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge MA USA
| |
Collapse
|
11
|
Schäfer F, Lückemeier L, Glorius F. Improving reproducibility through condition-based sensitivity assessments: application, advancement and prospect. Chem Sci 2024:d4sc03017f. [PMID: 39263664 PMCID: PMC11382186 DOI: 10.1039/d4sc03017f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2024] [Accepted: 08/29/2024] [Indexed: 09/13/2024] Open
Abstract
The fluctuating reproducibility of scientific reports presents a well-recognised issue, frequently stemming from insufficient standardisation, transparency and a lack of information in scientific publications. Consequently, the incorporation of newly developed synthetic methods into practical applications often occurs at a slow rate. In recent years, various efforts have been made to analyse the sensitivity of chemical methodologies and the variation in quantitative outcome observed across different laboratory environments. For today's chemists, determining the key factors that really matter for a reaction's outcome from all the different aspects of chemical methodology can be a challenging task. In response, we provide a detailed examination and customised recommendations surrounding the sensitivity screen, offering a comprehensive assessment of various strategies and exploring their diverse applications by research groups to improve the practicality of their methodologies.
Collapse
Affiliation(s)
- Felix Schäfer
- Universität Münster, Organisch-Chemisches Institut Corrensstraße 36 48149 Münster Germany
| | - Lukas Lückemeier
- Universität Münster, Organisch-Chemisches Institut Corrensstraße 36 48149 Münster Germany
| | - Frank Glorius
- Universität Münster, Organisch-Chemisches Institut Corrensstraße 36 48149 Münster Germany
| |
Collapse
|
12
|
Zhang X, Li Y, Li C, Zhu J, Gan Z, Wang L, Sun X, You H. A chemical reaction entity recognition method based on a natural language data augmentation strategy. Chem Commun (Camb) 2024; 60:9610-9613. [PMID: 39148332 DOI: 10.1039/d4cc01471e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2024]
Abstract
Impressive applications of artificial intelligence in the field of chemical reaction prediction heavily depend on abundant reliable datasets. The automated extraction of reaction procedures to build structured chemical databases is of growing importance. Here, we propose a novel model named DACRER for large-scale reaction extraction, in which transfer learning and a data augmentation strategy were employed. This model was evaluated for chemical datasets and shows good performance in identifying and processing chemical texts.
Collapse
Affiliation(s)
- Xiaowen Zhang
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, Guangdong, China.
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, Anhui, China
| | - Chaoyi Li
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, Guangdong, China.
| | - Jingyuan Zhu
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, Guangdong, China.
| | - Zhiqiang Gan
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, Guangdong, China.
| | - Lei Wang
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, Shandong, China
| | - Xiaofei Sun
- School of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, Shandong, China
| | - Hengzhi You
- School of Science, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, Guangdong, China.
- Green Pharmaceutical Engineering Research Center, Harbin Institute of Technology (Shenzhen), Shenzhen 518055, Guangdong, China
| |
Collapse
|
13
|
Tom G, Schmid SP, Baird SG, Cao Y, Darvish K, Hao H, Lo S, Pablo-García S, Rajaonson EM, Skreta M, Yoshikawa N, Corapi S, Akkoc GD, Strieth-Kalthoff F, Seifrid M, Aspuru-Guzik A. Self-Driving Laboratories for Chemistry and Materials Science. Chem Rev 2024; 124:9633-9732. [PMID: 39137296 PMCID: PMC11363023 DOI: 10.1021/acs.chemrev.4c00055] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
Self-driving laboratories (SDLs) promise an accelerated application of the scientific method. Through the automation of experimental workflows, along with autonomous experimental planning, SDLs hold the potential to greatly accelerate research in chemistry and materials discovery. This review provides an in-depth analysis of the state-of-the-art in SDL technology, its applications across various scientific disciplines, and the potential implications for research and industry. This review additionally provides an overview of the enabling technologies for SDLs, including their hardware, software, and integration with laboratory infrastructure. Most importantly, this review explores the diverse range of scientific domains where SDLs have made significant contributions, from drug discovery and materials science to genomics and chemistry. We provide a comprehensive review of existing real-world examples of SDLs, their different levels of automation, and the challenges and limitations associated with each domain.
Collapse
Affiliation(s)
- Gary Tom
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Stefan P. Schmid
- Department
of Chemistry and Applied Biosciences, ETH
Zurich, Vladimir-Prelog-Weg 1, CH-8093 Zurich, Switzerland
| | - Sterling G. Baird
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Yang Cao
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Kourosh Darvish
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Han Hao
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
| | - Stanley Lo
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
| | - Sergio Pablo-García
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
| | - Ella M. Rajaonson
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Marta Skreta
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Naruki Yoshikawa
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
| | - Samantha Corapi
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
| | - Gun Deniz Akkoc
- Forschungszentrum
Jülich GmbH, Helmholtz Institute
for Renewable Energy Erlangen-Nürnberg, Cauerstr. 1, 91058 Erlangen, Germany
- Department
of Chemical and Biological Engineering, Friedrich-Alexander Universität Erlangen-Nürnberg, Egerlandstr. 3, 91058 Erlangen, Germany
| | - Felix Strieth-Kalthoff
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- School of
Mathematics and Natural Sciences, University
of Wuppertal, Gaußstraße
20, 42119 Wuppertal, Germany
| | - Martin Seifrid
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Department
of Materials Science and Engineering, North
Carolina State University, Raleigh, North Carolina 27695, United States of America
| | - Alán Aspuru-Guzik
- Department
of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
- Vector Institute
for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada
- Acceleration
Consortium, 80 St. George
St, Toronto, Ontario M5S 3H6, Canada
- Department
of Chemical Engineering & Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
- Department
of Materials Science & Engineering, University of Toronto, Toronto, Ontario M5S 3E4, Canada
- Lebovic
Fellow, Canadian Institute for Advanced
Research (CIFAR), 661
University Ave, Toronto, Ontario M5G 1M1, Canada
| |
Collapse
|
14
|
Su Y, Wang X, Ye Y, Xie Y, Xu Y, Jiang Y, Wang C. Automation and machine learning augmented by large language models in a catalysis study. Chem Sci 2024; 15:12200-12233. [PMID: 39118602 PMCID: PMC11304797 DOI: 10.1039/d3sc07012c] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2023] [Accepted: 06/21/2024] [Indexed: 08/10/2024] Open
Abstract
Recent advancements in artificial intelligence and automation are transforming catalyst discovery and design from traditional trial-and-error manual mode into intelligent, high-throughput digital methodologies. This transformation is driven by four key components, including high-throughput information extraction, automated robotic experimentation, real-time feedback for iterative optimization, and interpretable machine learning for generating new knowledge. These innovations have given rise to the development of self-driving labs and significantly accelerated materials research. Over the past two years, the emergence of large language models (LLMs) has added a new dimension to this field, providing unprecedented flexibility in information integration, decision-making, and interacting with human researchers. This review explores how LLMs are reshaping catalyst design, heralding a revolutionary change in the fields.
Collapse
Affiliation(s)
- Yuming Su
- iChem, State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM) Xiamen 361005 P. R. China
| | - Xue Wang
- iChem, State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 P. R. China
| | - Yuanxiang Ye
- Institute of Artificial Intelligence, Xiamen University Xiamen 361005 P. R. China
| | - Yibo Xie
- Institute of Artificial Intelligence, Xiamen University Xiamen 361005 P. R. China
| | - Yujing Xu
- iChem, State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 P. R. China
| | - Yibin Jiang
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM) Xiamen 361005 P. R. China
| | - Cheng Wang
- iChem, State Key Laboratory of Physical Chemistry of Solid Surfaces, College of Chemistry and Chemical Engineering, Xiamen University Xiamen 361005 P. R. China
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM) Xiamen 361005 P. R. China
| |
Collapse
|
15
|
Zhang W, Wang Q, Kong X, Xiong J, Ni S, Cao D, Niu B, Chen M, Li Y, Zhang R, Wang Y, Zhang L, Li X, Xiong Z, Shi Q, Huang Z, Fu Z, Zheng M. Fine-tuning large language models for chemical text mining. Chem Sci 2024; 15:10600-10611. [PMID: 38994403 PMCID: PMC11234886 DOI: 10.1039/d4sc00924j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Accepted: 06/02/2024] [Indexed: 07/13/2024] Open
Abstract
Extracting knowledge from complex and diverse chemical texts is a pivotal task for both experimental and computational chemists. The task is still considered to be extremely challenging due to the complexity of the chemical language and scientific literature. This study explored the power of fine-tuned large language models (LLMs) on five intricate chemical text mining tasks: compound entity recognition, reaction role labelling, metal-organic framework (MOF) synthesis information extraction, nuclear magnetic resonance spectroscopy (NMR) data extraction, and the conversion of reaction paragraphs to action sequences. The fine-tuned LLMs demonstrated impressive performance, significantly reducing the need for repetitive and extensive prompt engineering experiments. For comparison, we guided ChatGPT (GPT-3.5-turbo) and GPT-4 with prompt engineering and fine-tuned GPT-3.5-turbo as well as other open-source LLMs such as Mistral, Llama3, Llama2, T5, and BART. The results showed that the fine-tuned ChatGPT models excelled in all tasks. They achieved exact accuracy levels ranging from 69% to 95% on these tasks with minimal annotated data. They even outperformed those task-adaptive pre-training and fine-tuning models that were based on a significantly larger amount of in-domain data. Notably, fine-tuned Mistral and Llama3 show competitive abilities. Given their versatility, robustness, and low-code capability, leveraging fine-tuned LLMs as flexible and effective toolkits for automated data acquisition could revolutionize chemical knowledge extraction.
Collapse
Affiliation(s)
- Wei Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Qinggong Wang
- Nanjing University of Chinese Medicine 138 Xianlin Road Nanjing 210023 China
| | - Xiangtai Kong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Jiacheng Xiong
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Shengkun Ni
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Duanhua Cao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University Hangzhou Zhejiang 310058 China
| | - Buying Niu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Mingan Chen
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- School of Physical Science and Technology, ShanghaiTech University Shanghai 201210 China
- Lingang Laboratory Shanghai 200031 China
| | - Yameng Li
- ProtonUnfold Technology Co., Ltd Suzhou China
| | - Runze Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Yitian Wang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Lehan Zhang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
| | | | - Qian Shi
- Lingang Laboratory Shanghai 200031 China
| | - Ziming Huang
- Medizinische Klinik und Poliklinik I, Klinikum der Universität München, Ludwig-Maximilians-Universität Munich Germany
| | - Zunyun Fu
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
| | - Mingyue Zheng
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences 555 Zuchongzhi Road Shanghai 201203 China
- University of Chinese Academy of Sciences No. 19A Yuquan Road Beijing 100049 China
- Nanjing University of Chinese Medicine 138 Xianlin Road Nanjing 210023 China
| |
Collapse
|
16
|
Zhang Y, Chen F, Liu Z, Ju Y, Cui D, Zhu J, Jiang X, Guo X, He J, Zhang L, Zhang X, Su Y. A materials terminology knowledge graph automatically constructed from text corpus. Sci Data 2024; 11:600. [PMID: 38849436 PMCID: PMC11161478 DOI: 10.1038/s41597-024-03448-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Accepted: 05/31/2024] [Indexed: 06/09/2024] Open
Abstract
A scalable, reusable, and broad-coverage unified material knowledge representation shows its importance and will bring great benefits to data sharing among materials communities. A knowledge graph (KG) for materials terminology, which is a formal collection of term entities and relationships, is conceptually important to achieve this goal. In this work, we propose a KG for materials terminology, named Materials Genome Engineering Database Knowledge Graph (MGED-KG), which is automatically constructed from text corpus via natural language processing. MGED-KG is the most comprehensive KG for materials terminology in both Chinese and English languages, consisting of 8,660 terms and their explanations. It encompasses 11 principal categories, such as Metals, Composites, Nanomaterials, each with two or three levels of subcategories, resulting in a total of 235 distinct category labels. For further application, a knowledge web system based on MGED-KG is developed and shows its great power in improving data sharing efficiency from the aspects of query expansion, term, and data recommendation.
Collapse
Affiliation(s)
- Yuwei Zhang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
| | - Fangyi Chen
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
| | - Zeyi Liu
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
| | - Yunzhuo Ju
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
| | - Dongliang Cui
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
| | - Jinyi Zhu
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
| | - Xue Jiang
- Beijing Advanced Innovation Center for Materials Genome Engineering, Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing, 100083, China.
- Liaoning Academy of Materials, Shenyang, 110000, Liaoning, China.
- Shunde Innovation School, University of Science and Technology Beijing, Guangdong, 528399, China.
| | - Xi Guo
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China.
- Beijing Key Laboratory of Knowledge Engineering for Materials, Beijing, 100083, China.
| | - Jie He
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China.
- Liaoning Academy of Materials, Shenyang, 110000, Liaoning, China.
| | - Lei Zhang
- Beijing Advanced Innovation Center for Materials Genome Engineering, Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing, 100083, China
| | - Xiaotong Zhang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, 100083, China
| | - Yanjing Su
- Beijing Advanced Innovation Center for Materials Genome Engineering, Institute for Advanced Materials and Technology, University of Science and Technology Beijing, Beijing, 100083, China
| |
Collapse
|
17
|
Wang G, Wang C, Zhang X, Li Z, Zhou J, Sun Z. Machine learning interatomic potential: Bridge the gap between small-scale models and realistic device-scale simulations. iScience 2024; 27:109673. [PMID: 38646181 PMCID: PMC11033164 DOI: 10.1016/j.isci.2024.109673] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2024] Open
Abstract
Machine learning interatomic potential (MLIP) overcomes the challenges of high computational costs in density-functional theory and the relatively low accuracy in classical large-scale molecular dynamics, facilitating more efficient and precise simulations in materials research and design. In this review, the current state of the four essential stages of MLIP is discussed, including data generation methods, material structure descriptors, six unique machine learning algorithms, and available software. Furthermore, the applications of MLIP in various fields are investigated, notably in phase-change memory materials, structure searching, material properties predicting, and the pre-trained universal models. Eventually, the future perspectives, consisting of standard datasets, transferability, generalization, and trade-off between accuracy and complexity in MLIPs, are reported.
Collapse
Affiliation(s)
- Guanjie Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
- School of Integrated Circuit Science and Engineering, Beihang University, Beijing 100191, China
| | - Changrui Wang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Xuanguang Zhang
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zefeng Li
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Jian Zhou
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| | - Zhimei Sun
- School of Materials Science and Engineering, Beihang University, Beijing 100191, China
| |
Collapse
|
18
|
Strieth-Kalthoff F, Hao H, Rathore V, Derasp J, Gaudin T, Angello NH, Seifrid M, Trushina E, Guy M, Liu J, Tang X, Mamada M, Wang W, Tsagaantsooj T, Lavigne C, Pollice R, Wu TC, Hotta K, Bodo L, Li S, Haddadnia M, Wołos A, Roszak R, Ser CT, Bozal-Ginesta C, Hickman RJ, Vestfrid J, Aguilar-Granda A, Klimareva EL, Sigerson RC, Hou W, Gahler D, Lach S, Warzybok A, Borodin O, Rohrbach S, Sanchez-Lengeling B, Adachi C, Grzybowski BA, Cronin L, Hein JE, Burke MD, Aspuru-Guzik A. Delocalized, asynchronous, closed-loop discovery of organic laser emitters. Science 2024; 384:eadk9227. [PMID: 38753786 DOI: 10.1126/science.adk9227] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 04/05/2024] [Indexed: 05/18/2024]
Abstract
Contemporary materials discovery requires intricate sequences of synthesis, formulation, and characterization that often span multiple locations with specialized expertise or instrumentation. To accelerate these workflows, we present a cloud-based strategy that enabled delocalized and asynchronous design-make-test-analyze cycles. We showcased this approach through the exploration of molecular gain materials for organic solid-state lasers as a frontier application in molecular optoelectronics. Distributed robotic synthesis and in-line property characterization, orchestrated by a cloud-based artificial intelligence experiment planner, resulted in the discovery of 21 new state-of-the-art materials. Gram-scale synthesis ultimately allowed for the verification of best-in-class stimulated emission in a thin-film device. Demonstrating the asynchronous integration of five laboratories across the globe, this workflow provides a blueprint for delocalizing-and democratizing-scientific discovery.
Collapse
Affiliation(s)
- Felix Strieth-Kalthoff
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Han Hao
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Acceleration Consortium, University of Toronto, Toronto, ON, Canada
| | - Vandana Rathore
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Molecule Maker Lab, Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Joshua Derasp
- Department of Chemistry, University of British Columbia, Vancouver, BC, Canada
| | - Théophile Gaudin
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Nicholas H Angello
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Molecule Maker Lab, Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Molecule Maker Lab Institute, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Martin Seifrid
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Department of Materials Science and Engineering, North Carolina State University, Raleigh, NC, USA
| | | | - Mason Guy
- Department of Chemistry, University of British Columbia, Vancouver, BC, Canada
| | - Junliang Liu
- Department of Chemistry, University of British Columbia, Vancouver, BC, Canada
| | - Xun Tang
- Center for Organic Photonics and Electronics Research (OPERA), Kyushu University, Fukuoka, Japan
| | - Masashi Mamada
- Center for Organic Photonics and Electronics Research (OPERA), Kyushu University, Fukuoka, Japan
| | - Wesley Wang
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Molecule Maker Lab, Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Molecule Maker Lab Institute, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Tuul Tsagaantsooj
- Center for Organic Photonics and Electronics Research (OPERA), Kyushu University, Fukuoka, Japan
| | - Cyrille Lavigne
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Robert Pollice
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Tony C Wu
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Kazuhiro Hotta
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Mitsubishi Chemical Corporation Science & Innovation Center, Kanagawa, Japan
| | - Leticia Bodo
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Shangyu Li
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
| | - Mohammad Haddadnia
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
| | - Agnieszka Wołos
- Allchemy Inc., Highland, IN, USA
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | - Rafał Roszak
- Allchemy Inc., Highland, IN, USA
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | - Cher Tian Ser
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Carlota Bozal-Ginesta
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Catalonia Institute for Energy Research, Barcelona, Spain
| | - Riley J Hickman
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Jenya Vestfrid
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Andrés Aguilar-Granda
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | | | | | - Wenduan Hou
- School of Chemistry, University of Glasgow, Glasgow, UK
| | - Daniel Gahler
- School of Chemistry, University of Glasgow, Glasgow, UK
| | - Slawomir Lach
- School of Chemistry, University of Glasgow, Glasgow, UK
| | - Adrian Warzybok
- School of Chemistry, University of Glasgow, Glasgow, UK
- Department of Chemical Physics, Jagiellonian University, Krakow, Poland
| | - Oleg Borodin
- School of Chemistry, University of Glasgow, Glasgow, UK
| | | | | | - Chihaya Adachi
- Center for Organic Photonics and Electronics Research (OPERA), Kyushu University, Fukuoka, Japan
| | - Bartosz A Grzybowski
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
- Center for Algorithmic and Robotized Synthesis, Institute for Basic Science, Ulsan, Republic of Korea
- Department of Chemistry, Ulsan Institute of Science and Technology, Ulsan, Republic of Korea
| | - Leroy Cronin
- Acceleration Consortium, University of Toronto, Toronto, ON, Canada
- School of Chemistry, University of Glasgow, Glasgow, UK
| | - Jason E Hein
- Acceleration Consortium, University of Toronto, Toronto, ON, Canada
- Department of Chemistry, University of British Columbia, Vancouver, BC, Canada
- Department of Chemistry, University of Bergen, Bergen, Norway
| | - Martin D Burke
- Acceleration Consortium, University of Toronto, Toronto, ON, Canada
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Molecule Maker Lab, Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Molecule Maker Lab Institute, Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Carle Illinois College of Medicine, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Acceleration Consortium, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Materials Science and Engineering, University of Toronto, Toronto, ON, Canada
- Canadian Institute for Advanced Research (CIFAR), Toronto, ON, Canada
| |
Collapse
|
19
|
Coin G, Jiang T, Bordi S, Nichols PL, Bode JW, Wanner BM. Automated, Capsule-Based Suzuki-Miyaura Cross Couplings. Org Lett 2024; 26:2708-2712. [PMID: 37126221 DOI: 10.1021/acs.orglett.3c01057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
The development of an automated process for Suzuki-Miyaura cross couplings is described, in which the complete reaction, workup, and product isolation are effected automatically with no user involvement, aside from loading of the starting materials and reaction capsule. This practical and simple method was successfully demonstrated to provide the desired biaryl products using a range of aryl bromides and boronic acids and is also effective for the late-stage functionalization of aryl halides in bioactive molecules.
Collapse
Affiliation(s)
- Guillaume Coin
- Synple Chem AG, Kemptpark 18, 8310 Kemptthal, Switzerland
- Laboratory of Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| | - Tuo Jiang
- Synple Chem AG, Kemptpark 18, 8310 Kemptthal, Switzerland
| | - Samuele Bordi
- Synple Chem AG, Kemptpark 18, 8310 Kemptthal, Switzerland
| | - Paula L Nichols
- Synple Chem AG, Kemptpark 18, 8310 Kemptthal, Switzerland
- Laboratory of Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| | - Jeffrey W Bode
- Laboratory of Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH Zürich, 8093 Zürich, Switzerland
| | | |
Collapse
|
20
|
Wagner F, Sagmeister P, Jusner CE, Tampone TG, Manee V, Buono FG, Williams JD, Kappe CO. A Slug Flow Platform with Multiple Process Analytics Facilitates Flexible Reaction Optimization. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2308034. [PMID: 38273711 PMCID: PMC10987115 DOI: 10.1002/advs.202308034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 12/21/2023] [Indexed: 01/27/2024]
Abstract
Flow processing offers many opportunities to optimize reactions in a rapid and automated manner, yet often requires relatively large quantities of input materials. To combat this, the use of a flexible slug flow reactor, equipped with two analytical instruments, for low-volume optimization experiments are reported. A Buchwald-Hartwig amination toward the drug olanzapine, with 6 independent optimizable variables, is optimized using three different automated approaches: self-optimization, design of experiments, and kinetic modeling. These approaches are complementary and provide differing information on the reaction: pareto optimal operating points, response surface models, and mechanistic models, respectively. The results are achieved using <10% of the material that would be required for standard flow operation. Finally, a chemometric model is built utilizing automated data handling and three subsequent validation experiments demonstrate good agreement between the slug flow reactor and a standard (larger scale) flow reactor.
Collapse
Affiliation(s)
- Florian Wagner
- Center for Continuous Flow Synthesis and Processing (CC FLOW)Research Center Pharmaceutical Engineering GmbH (RCPE)Inffeldgasse 13Graz8010Austria
- Institute of ChemistryUniversity of GrazNAWI Graz, Heinrichstrasse 28Graz8010Austria
| | - Peter Sagmeister
- Center for Continuous Flow Synthesis and Processing (CC FLOW)Research Center Pharmaceutical Engineering GmbH (RCPE)Inffeldgasse 13Graz8010Austria
- Institute of ChemistryUniversity of GrazNAWI Graz, Heinrichstrasse 28Graz8010Austria
| | - Clemens E. Jusner
- Center for Continuous Flow Synthesis and Processing (CC FLOW)Research Center Pharmaceutical Engineering GmbH (RCPE)Inffeldgasse 13Graz8010Austria
- Institute of ChemistryUniversity of GrazNAWI Graz, Heinrichstrasse 28Graz8010Austria
| | - Thomas G. Tampone
- Boehringer Ingelheim Pharmaceuticals, Inc900 Ridgebury RoadRidgefieldCT06877USA
| | - Vidhyadhar Manee
- Boehringer Ingelheim Pharmaceuticals, Inc900 Ridgebury RoadRidgefieldCT06877USA
| | - Frederic G. Buono
- Boehringer Ingelheim Pharmaceuticals, Inc900 Ridgebury RoadRidgefieldCT06877USA
| | - Jason D. Williams
- Center for Continuous Flow Synthesis and Processing (CC FLOW)Research Center Pharmaceutical Engineering GmbH (RCPE)Inffeldgasse 13Graz8010Austria
- Institute of ChemistryUniversity of GrazNAWI Graz, Heinrichstrasse 28Graz8010Austria
| | - C. Oliver Kappe
- Center for Continuous Flow Synthesis and Processing (CC FLOW)Research Center Pharmaceutical Engineering GmbH (RCPE)Inffeldgasse 13Graz8010Austria
- Institute of ChemistryUniversity of GrazNAWI Graz, Heinrichstrasse 28Graz8010Austria
| |
Collapse
|
21
|
Leonov AI, Hammer AJS, Lach S, Mehr SHM, Caramelli D, Angelone D, Khan A, O'Sullivan S, Craven M, Wilbraham L, Cronin L. An integrated self-optimizing programmable chemical synthesis and reaction engine. Nat Commun 2024; 15:1240. [PMID: 38336880 PMCID: PMC10858227 DOI: 10.1038/s41467-024-45444-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 01/22/2024] [Indexed: 02/12/2024] Open
Abstract
Robotic platforms for chemistry are developing rapidly but most systems are not currently able to adapt to changing circumstances in real-time. We present a dynamically programmable system capable of making, optimizing, and discovering new molecules which utilizes seven sensors that continuously monitor the reaction. By developing a dynamic programming language, we demonstrate the 10-fold scale-up of a highly exothermic oxidation reaction, end point detection, as well as detecting critical hardware failures. We also show how the use of in-line spectroscopy such as HPLC, Raman, and NMR can be used for closed-loop optimization of reactions, exemplified using Van Leusen oxazole synthesis, a four-component Ugi condensation and manganese-catalysed epoxidation reactions, as well as two previously unreported reactions, discovered from a selected chemical space, providing up to 50% yield improvement over 25-50 iterations. Finally, we demonstrate an experimental pipeline to explore a trifluoromethylations reaction space, that discovers new molecules.
Collapse
Affiliation(s)
- Artem I Leonov
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Alexander J S Hammer
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Slawomir Lach
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - S Hessam M Mehr
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Dario Caramelli
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Davide Angelone
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Aamir Khan
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Steven O'Sullivan
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Matthew Craven
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Liam Wilbraham
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK
| | - Leroy Cronin
- School of Chemistry, The University of Glasgow, University Avenue, Glasgow, G12 8QQ, UK.
| |
Collapse
|
22
|
Wang Z, Chen A, Tao K, Han Y, Li J. MatGPT: A Vane of Materials Informatics from Past, Present, to Future. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2024; 36:e2306733. [PMID: 37813548 DOI: 10.1002/adma.202306733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 09/05/2023] [Indexed: 10/17/2023]
Abstract
Combining materials science, artificial intelligence (AI), physical chemistry, and other disciplines, materials informatics is continuously accelerating the vigorous development of new materials. The emergence of "GPT (Generative Pre-trained Transformer) AI" shows that the scientific research field has entered the era of intelligent civilization with "data" as the basic factor and "algorithm + computing power" as the core productivity. The continuous innovation of AI will impact the cognitive laws and scientific methods, and reconstruct the knowledge and wisdom system. This leads to think more about materials informatics. Here, a comprehensive discussion of AI models and materials infrastructures is provided, and the advances in the discovery and design of new materials are reviewed. With the rise of new research paradigms triggered by "AI for Science", the vane of materials informatics: "MatGPT", is proposed and the technical path planning from the aspects of data, descriptors, generative models, pretraining models, directed design models, collaborative training, experimental robots, as well as the efforts and preparations needed to develop a new generation of materials informatics, is carried out. Finally, the challenges and constraints faced by materials informatics are discussed, in order to achieve a more digital, intelligent, and automated construction of materials informatics with the joint efforts of more interdisciplinary scientists.
Collapse
Affiliation(s)
- Zhilong Wang
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory of Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano Electronics, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - An Chen
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory of Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano Electronics, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Kehao Tao
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory of Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano Electronics, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanqiang Han
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory of Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano Electronics, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Jinjin Li
- National Key Laboratory of Science and Technology on Micro/Nano Fabrication, Shanghai Jiao Tong University, Shanghai, 200240, China
- Key Laboratory of Thin Film and Microfabrication of Ministry of Education, Department of Micro/Nano Electronics, Shanghai Jiao Tong University, Shanghai, 200240, China
| |
Collapse
|
23
|
Bai J, Mosbach S, Taylor CJ, Karan D, Lee KF, Rihm SD, Akroyd J, Lapkin AA, Kraft M. A dynamic knowledge graph approach to distributed self-driving laboratories. Nat Commun 2024; 15:462. [PMID: 38263405 PMCID: PMC10805810 DOI: 10.1038/s41467-023-44599-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 12/21/2023] [Indexed: 01/25/2024] Open
Abstract
The ability to integrate resources and share knowledge across organisations empowers scientists to expedite the scientific discovery process. This is especially crucial in addressing emerging global challenges that require global solutions. In this work, we develop an architecture for distributed self-driving laboratories within The World Avatar project, which seeks to create an all-encompassing digital twin based on a dynamic knowledge graph. We employ ontologies to capture data and material flows in design-make-test-analyse cycles, utilising autonomous agents as executable knowledge components to carry out the experimentation workflow. Data provenance is recorded to ensure its findability, accessibility, interoperability, and reusability. We demonstrate the practical application of our framework by linking two robots in Cambridge and Singapore for a collaborative closed-loop optimisation for a pharmaceutically-relevant aldol condensation reaction in real-time. The knowledge graph autonomously evolves toward the scientist's research goals, with the two robots effectively generating a Pareto front for cost-yield optimisation in three days.
Collapse
Affiliation(s)
- Jiaru Bai
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK
| | - Sebastian Mosbach
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK
- Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
| | - Connor J Taylor
- Astex Pharmaceuticals, 436 Cambridge Science Park Milton Road, Cambridge, CB4 0QA, UK
- Innovation Centre in Digital Molecular Technologies, Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
- Faculty of Engineering, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
| | - Dogancan Karan
- Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
| | - Kok Foong Lee
- CMCL Innovations, Sheraton House, Cambridge, CB3 0AX, UK
| | - Simon D Rihm
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK
- Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
| | - Jethro Akroyd
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK
- Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
| | - Alexei A Lapkin
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK
- Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore
- Innovation Centre in Digital Molecular Technologies, Yusuf Hamied Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - Markus Kraft
- Department of Chemical Engineering and Biotechnology, University of Cambridge, Philippa Fawcett Drive, Cambridge, CB3 0AS, UK.
- Cambridge Centre for Advanced Research and Education in Singapore (CARES), 1 Create Way, CREATE Tower, #05-05, Singapore, 138602, Singapore.
- School of Chemical and Biomedical Engineering, Nanyang Technological University, 62 Nanyang Drive, 637459, Singapore, Singapore.
- The Alan Turing Institute, London, NW1 2DB, UK.
| |
Collapse
|
24
|
Voinarovska V, Kabeshov M, Dudenko D, Genheden S, Tetko IV. When Yield Prediction Does Not Yield Prediction: An Overview of the Current Challenges. J Chem Inf Model 2024; 64:42-56. [PMID: 38116926 PMCID: PMC10778086 DOI: 10.1021/acs.jcim.3c01524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 11/29/2023] [Accepted: 11/30/2023] [Indexed: 12/21/2023]
Abstract
Machine Learning (ML) techniques face significant challenges when predicting advanced chemical properties, such as yield, feasibility of chemical synthesis, and optimal reaction conditions. These challenges stem from the high-dimensional nature of the prediction task and the myriad essential variables involved, ranging from reactants and reagents to catalysts, temperature, and purification processes. Successfully developing a reliable predictive model not only holds the potential for optimizing high-throughput experiments but can also elevate existing retrosynthetic predictive approaches and bolster a plethora of applications within the field. In this review, we systematically evaluate the efficacy of current ML methodologies in chemoinformatics, shedding light on their milestones and inherent limitations. Additionally, a detailed examination of a representative case study provides insights into the prevailing issues related to data availability and transferability in the discipline.
Collapse
Affiliation(s)
- Varvara Voinarovska
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
- TUM
Graduate School, Faculty of Chemistry, Technical
University of Munich, 85748 Garching, Germany
| | - Mikhail Kabeshov
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
| | - Dmytro Dudenko
- Enamine
Ltd., 78 Chervonotkatska str., 02094 Kyiv, Ukraine
| | - Samuel Genheden
- Molecular
AI, Discovery Sciences R&D, AstraZeneca, 431 83 Gothenburg, Sweden
| | - Igor V. Tetko
- Molecular
Targets and Therapeutics Center, Helmholtz Munich − Deutsches
Forschungszentrum für Gesundheit und Umwelt (GmbH), Institute of Structural Biology, 85764 Neuherberg, Germany
| |
Collapse
|
25
|
Koscher BA, Canty RB, McDonald MA, Greenman KP, McGill CJ, Bilodeau CL, Jin W, Wu H, Vermeire FH, Jin B, Hart T, Kulesza T, Li SC, Jaakkola TS, Barzilay R, Gómez-Bombarelli R, Green WH, Jensen KF. Autonomous, multiproperty-driven molecular discovery: From predictions to measurements and back. Science 2023; 382:eadi1407. [PMID: 38127734 DOI: 10.1126/science.adi1407] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Accepted: 11/09/2023] [Indexed: 12/23/2023]
Abstract
A closed-loop, autonomous molecular discovery platform driven by integrated machine learning tools was developed to accelerate the design of molecules with desired properties. We demonstrated two case studies on dye-like molecules, targeting absorption wavelength, lipophilicity, and photooxidative stability. In the first study, the platform experimentally realized 294 unreported molecules across three automatic iterations of molecular design-make-test-analyze cycles while exploring the structure-function space of four rarely reported scaffolds. In each iteration, the property prediction models that guided exploration learned the structure-property space of diverse scaffold derivatives, which were realized with multistep syntheses and a variety of reactions. The second study exploited property models trained on the explored chemical space and previously reported molecules to discover nine top-performing molecules within a lightly explored structure-property space.
Collapse
Affiliation(s)
- Brent A Koscher
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Richard B Canty
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Matthew A McDonald
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Kevin P Greenman
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Charles J McGill
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Camille L Bilodeau
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Wengong Jin
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Haoyang Wu
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Florence H Vermeire
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Brooke Jin
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Travis Hart
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Timothy Kulesza
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Shih-Cheng Li
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Tommi S Jaakkola
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Regina Barzilay
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Rafael Gómez-Bombarelli
- Department of Materials Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Klavs F Jensen
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
26
|
Suvarna M, Vaucher AC, Mitchell S, Laino T, Pérez-Ramírez J. Language models and protocol standardization guidelines for accelerating synthesis planning in heterogeneous catalysis. Nat Commun 2023; 14:7964. [PMID: 38042926 PMCID: PMC10693572 DOI: 10.1038/s41467-023-43836-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 11/22/2023] [Indexed: 12/04/2023] Open
Abstract
Synthesis protocol exploration is paramount in catalyst discovery, yet keeping pace with rapid literature advances is increasingly time intensive. Automated synthesis protocol analysis is attractive for swiftly identifying opportunities and informing predictive models, however such applications in heterogeneous catalysis remain limited. In this proof-of-concept, we introduce a transformer model for this task, exemplified using single-atom heterogeneous catalysts (SACs), a rapidly expanding catalyst family. Our model adeptly converts SAC protocols into action sequences, and we use this output to facilitate statistical inference of their synthesis trends and applications, potentially expediting literature review and analysis. We demonstrate the model's adaptability across distinct heterogeneous catalyst families, underscoring its versatility. Finally, our study highlights a critical issue: the lack of standardization in reporting protocols hampers machine-reading capabilities. Embracing digital advances in catalysis demands a shift in data reporting norms, and to this end, we offer guidelines for writing protocols, significantly improving machine-readability. We release our model as an open-source web application, inviting a fresh approach to accelerate heterogeneous catalysis synthesis planning.
Collapse
Affiliation(s)
- Manu Suvarna
- Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 1, 8093, Zurich, Switzerland
| | | | - Sharon Mitchell
- Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 1, 8093, Zurich, Switzerland
| | - Teodoro Laino
- IBM Research Europe, Säumerstrasse 4, 8803, Rüschlikon, Switzerland.
| | - Javier Pérez-Ramírez
- Institute for Chemical and Bioengineering, Department of Chemistry and Applied Biosciences, ETH Zurich, Vladimir-Prelog-Weg 1, 8093, Zurich, Switzerland.
| |
Collapse
|
27
|
Feng S, Cai A, Wang Y, Zhang B, Qiao Q, Chen C, Wang S, Jiang J. A robotic AI-Chemist system for multi-modal AI-ready database. Natl Sci Rev 2023; 10:nwad332. [PMID: 38226367 PMCID: PMC10789233 DOI: 10.1093/nsr/nwad332] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 10/16/2023] [Accepted: 10/31/2023] [Indexed: 01/17/2024] Open
Abstract
By fusing literature data mining, high-performance simulations, and high-accuracy experiments, robotic AI-Chemist can achieve automated high-throughput production, classification, cleaning, association and fusion of data, and thus develop a multi-modal AI-ready database.
Collapse
Affiliation(s)
- Shuo Feng
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, China
| | - Aoran Cai
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, China
| | - Yang Wang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, China
| | - Baicheng Zhang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, China
| | - Qinyu Qiao
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, China
| | - Cheng Chen
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, China
| | - Song Wang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, China
| | - Jun Jiang
- Key Laboratory of Precision and Intelligent Chemistry, School of Chemistry and Materials Science, University of Science and Technology of China, China
| |
Collapse
|
28
|
Lei J, Liu Q. Difference of Convex Functions Programming With Machine-Learning Prior for the Imaging Problem in Electrical Capacitance Tomography. IEEE TRANSACTIONS ON CYBERNETICS 2023; 53:7535-7547. [PMID: 35604983 DOI: 10.1109/tcyb.2022.3173336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
The electrical capacitance tomography technology has potential benefits for the process industry by providing visualization of material distributions. One of the main technical gaps and impediments that must be overcome is the low-quality tomogram. To address this problem, this study introduces the data-guided prior and combines it with the electrical measurement mechanism and the sparsity prior to produce a new difference of convex functions programming problem that turns the image reconstruction problem into an optimization problem. The data-guided prior is learned from a provided dataset and captures the details of imaging targets since it is a specific image. A new numerical scheme that allows a complex optimization problem to be split into a few less difficult subproblems is developed to solve the challenging difference of convex functions programming problem. A new dimensionality reduction method is developed and combined with the relevance vector machine to generate a new learning engine for the forecast of the data-guided prior. The new imaging method fuses multisource information and unifies data-guided and measurement physics modeling paradigms. Performance evaluation results have validated that the new method successfully works on a series of test tasks with higher reconstruction quality and lower noise sensitivity than the popular imaging methods.
Collapse
|
29
|
Machi K, Akiyama S, Nagata Y, Yoshioka M. OSPAR: A Corpus for Extraction of Organic Synthesis Procedures with Argument Roles. J Chem Inf Model 2023; 63:6619-6628. [PMID: 37859303 PMCID: PMC10647022 DOI: 10.1021/acs.jcim.3c01449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/05/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023]
Abstract
There is a pressing need for the automated extraction of chemical reaction information because of the rapid growth of scientific documents. The previously reported works in the literature for the procedure extraction either (a) did not consider the semantic relations between the action and argument or (b) defined a detailed schema for the extraction. The former method was insufficient for reproducing the reaction, while the latter methods were too specific to their own schema and did not consider the general semantic relation between the verb and argument. In addition, they did not provide an annotated text that aligned with the structured procedure. Along these lines, in this work, we propose a corpus named organic synthesis procedures with argument roles (OSPAR) that is annotated with rolesets to consider the semantic relation between the verb and argument. We also provide rolesets for chemical reactions, especially for organic synthesis, which represent the argument roles of actions in the corpus. More specifically, we annotated 112 organic synthesis procedures in journal articles from Organic Syntheses and defined 19 new rolesets in addition to 29 rolesets from an existing language resource (Proposition Bank). After that, we constructed a simple deep learning system trained on OSPAR and discussed the usefulness of the corpus by comparing it with chemical description language (XDL) generated by a natural language processing tool, namely, SynthReader. While our system's output required more detailed parsing, it covered comparable information against XDL. Moreover, we confirmed that the validation of the output action sequence was easy as it was aligned with the original text.
Collapse
Affiliation(s)
- Kojiro Machi
- Graduate
School of Information Science and Technology, Hokkaido University, Kita 14, Nishi
9, Kita-ku, Sapporo, Hokkaido 060-0814, Japan
| | - Seiji Akiyama
- Institute
for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University,
Kita 21, Nishi 10, Kita-ku, Sapporo, Hokkaido 001-0021, Japan
| | - Yuuya Nagata
- Institute
for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University,
Kita 21, Nishi 10, Kita-ku, Sapporo, Hokkaido 001-0021, Japan
| | - Masaharu Yoshioka
- Graduate
School of Information Science and Technology, Hokkaido University, Kita 14, Nishi
9, Kita-ku, Sapporo, Hokkaido 060-0814, Japan
- Institute
for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University,
Kita 21, Nishi 10, Kita-ku, Sapporo, Hokkaido 001-0021, Japan
- Faculty
of Information Science and Technology, Hokkaido
University, Kita 14, Nishi 9, Kita-ku, Sapporo, Hokkaido 060-0814, Japan
| |
Collapse
|
30
|
Williamson E, Brutchey RL. Using Data-Driven Learning to Predict and Control the Outcomes of Inorganic Materials Synthesis. Inorg Chem 2023; 62:16251-16262. [PMID: 37767941 PMCID: PMC10565808 DOI: 10.1021/acs.inorgchem.3c02697] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Indexed: 09/29/2023]
Abstract
The design of inorganic materials for various applications critically depends on our ability to manipulate their synthesis in a rational, robust, and controllable fashion. Different from the conventional trial-and-error approach, data-driven techniques such as the design of experiments (DoE) and machine learning are an effective and more efficient way to predictably control materials synthesis. Here, we present a Viewpoint on recent progress in leveraging such techniques for predicting and controlling the outcomes of inorganic materials synthesis. We first compare how the design choice (statistical DoE vs machine learning) affects the type of control it can offer over the resulting product attributes, information elucidated, and experimental cost. These attributes are supported by discussing select case studies from the recent literature that highlight the power of these techniques for materials synthesis. The influence of experimental bias is next discussed, followed finally by our perspectives on the major challenges in the widespread implementation of predictable and controllable materials synthesis using data-driven techniques.
Collapse
Affiliation(s)
- Emily
M. Williamson
- Department of Chemistry, University of Southern California, Los Angeles, California 90089, United States
| | - Richard L. Brutchey
- Department of Chemistry, University of Southern California, Los Angeles, California 90089, United States
| |
Collapse
|
31
|
Zeng Z, Nie YC, Ding N, Ding QJ, Ye WT, Yang C, Sun M, E W, Zhu R, Liu Z. Transcription between human-readable synthetic descriptions and machine-executable instructions: an application of the latest pre-training technology. Chem Sci 2023; 14:9360-9373. [PMID: 37712039 PMCID: PMC10498500 DOI: 10.1039/d3sc02483k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 08/15/2023] [Indexed: 09/16/2023] Open
Abstract
AI has been widely applied in scientific scenarios, such as robots performing chemical synthetic actions to free researchers from monotonous experimental procedures. However, there exists a gap between human-readable natural language descriptions and machine-executable instructions, of which the former are typically in numerous chemical articles, and the latter are currently compiled manually by experts. We apply the latest technology of pre-trained models and achieve automatic transcription between descriptions and instructions. We design a concise and comprehensive schema of instructions and construct an open-source human-annotated dataset consisting of 3950 description-instruction pairs, with 9.2 operations in each instruction on average. We further propose knowledgeable pre-trained transcription models enhanced by multi-grained chemical knowledge. The performance of recent popular models and products showing great capability in automatic writing (e.g., ChatGPT) has also been explored. Experiments prove that our system improves the instruction compilation efficiency of researchers by at least 42%, and can generate fluent academic paragraphs of synthetic descriptions when given instructions, showing the great potential of pre-trained models in improving human productivity.
Collapse
Affiliation(s)
- Zheni Zeng
- Department of Computer Science and Technology, Tsinghua University Beijing China
| | - Yi-Chen Nie
- College of Chemistry and Molecular Engineering, Peking University Beijing China
| | - Ning Ding
- Department of Computer Science and Technology, Tsinghua University Beijing China
| | - Qian-Jun Ding
- College of Chemistry and Molecular Engineering, Peking University Beijing China
| | - Wei-Ting Ye
- College of Chemistry and Molecular Engineering, Peking University Beijing China
| | - Cheng Yang
- School of Computer Science, Beijing University of Posts and Telecommunications Beijing China
| | - Maosong Sun
- Department of Computer Science and Technology, Tsinghua University Beijing China
| | - Weinan E
- Center for Machine Learning Research and School of Mathematical Sciences, Peking University AI for Science Institute Beijing China
| | - Rong Zhu
- College of Chemistry and Molecular Engineering, Peking University Beijing China
| | - Zhiyuan Liu
- Department of Computer Science and Technology, Tsinghua University Beijing China
| |
Collapse
|
32
|
Salley D, Manzano JS, Kitson PJ, Cronin L. Robotic Modules for the Programmable Chemputation of Molecules and Materials. ACS CENTRAL SCIENCE 2023; 9:1525-1537. [PMID: 37637738 PMCID: PMC10450877 DOI: 10.1021/acscentsci.3c00304] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Indexed: 08/29/2023]
Abstract
Before leveraging big data methods like machine learning and artificial intelligence (AI) in chemistry, there is an imperative need for an affordable, universal digitization standard. This mirrors the foundational requisites of the digital revolution, which demanded standard architectures with precise specifications. Recently, we have developed automated platforms tailored for chemical AI-driven exploration, including the synthesis of molecules, materials, nanomaterials, and formulations. Our focus has been on designing and constructing affordable standard hardware and software modules that serve as a blueprint for chemistry digitization across varied fields. Our platforms can be categorized into four types based on their applications: (i) discovery systems for the exploration of chemical space and novel reactivity, (ii) systems for the synthesis and manufacture of fine chemicals, (iii) platforms for formulation discovery and exploration, and (iv) systems for materials discovery and synthesis. We also highlight the convergent evolution of these platforms through shared hardware, firmware, and software alongside the creation of a unique programming language for chemical and material systems. This programming approach is essential for reliable synthesis, designing experiments, discovery, optimization, and establishing new collaboration standards. Furthermore, it is crucial for verifying literature findings, enhancing experimental outcome reliability, and fostering collaboration and sharing of unsuccessful experiments across different research labs.
Collapse
Affiliation(s)
- Daniel Salley
- School of Chemistry, University
of Glasgow, University Avenue, Glasgow G12 8QQ, U.K.
| | - J. Sebastián Manzano
- School of Chemistry, University
of Glasgow, University Avenue, Glasgow G12 8QQ, U.K.
| | - Philip J. Kitson
- School of Chemistry, University
of Glasgow, University Avenue, Glasgow G12 8QQ, U.K.
| | - Leroy Cronin
- School of Chemistry, University
of Glasgow, University Avenue, Glasgow G12 8QQ, U.K.
| |
Collapse
|
33
|
Park NH, Manica M, Born J, Hedrick JL, Erdmann T, Zubarev DY, Adell-Mill N, Arrechea PL. Artificial intelligence driven design of catalysts and materials for ring opening polymerization using a domain-specific language. Nat Commun 2023; 14:3686. [PMID: 37344485 PMCID: PMC10284867 DOI: 10.1038/s41467-023-39396-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 06/12/2023] [Indexed: 06/23/2023] Open
Abstract
Advances in machine learning (ML) and automated experimentation are poised to vastly accelerate research in polymer science. Data representation is a critical aspect for enabling ML integration in research workflows, yet many data models impose significant rigidity making it difficult to accommodate a broad array of experiment and data types found in polymer science. This inflexibility presents a significant barrier for researchers to leverage their historical data in ML development. Here we show that a domain specific language, termed Chemical Markdown Language (CMDL), provides flexible, extensible, and consistent representation of disparate experiment types and polymer structures. CMDL enables seamless use of historical experimental data to fine-tune regression transformer (RT) models for generative molecular design tasks. We demonstrate the utility of this approach through the generation and the experimental validation of catalysts and polymers in the context of ring-opening polymerization-although we provide examples of how CMDL can be more broadly applied to other polymer classes. Critically, we show how the CMDL tuned model preserves key functional groups within the polymer structure, allowing for experimental validation. These results reveal the versatility of CMDL and how it facilitates translation of historical data into meaningful predictive and generative models to produce experimentally actionable output.
Collapse
Affiliation(s)
| | - Matteo Manica
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
| | - Jannis Born
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Mattenstrasse 26, 4058, Basel, Switzerland
| | - James L Hedrick
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | - Tim Erdmann
- IBM Research-Almaden, 650 Harry Rd., San Jose, CA, 95120, USA
| | | | - Nil Adell-Mill
- IBM Research-Zurich, Säumerstrasse 4, Rüschlikon, 8803, Switzerland
- Arctoris, 120E Olympic Avenue, Abingdon, OX14 4SA, Oxfordshire, UK
| | | |
Collapse
|
34
|
Jablonka K, Rosen AS, Krishnapriyan AS, Smit B. An Ecosystem for Digital Reticular Chemistry. ACS CENTRAL SCIENCE 2023; 9:563-581. [PMID: 37122448 PMCID: PMC10141625 DOI: 10.1021/acscentsci.2c01177] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
The vastness of the materials design space makes it impractical to explore using traditional brute-force methods, particularly in reticular chemistry. However, machine learning has shown promise in expediting and guiding materials design. Despite numerous successful applications of machine learning to reticular materials, progress in the field has stagnated, possibly because digital chemistry is more an art than a science and its limited accessibility to inexperienced researchers. To address this issue, we present mofdscribe, a software ecosystem tailored to novice and seasoned digital chemists that streamlines the ideation, modeling, and publication process. Though optimized for reticular chemistry, our tools are versatile and can be used in nonreticular materials research. We believe that mofdscribe will enable a more reliable, efficient, and comparable field of digital chemistry.
Collapse
Affiliation(s)
- Kevin
Maik Jablonka
- Laboratory of molecular simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Rue de l’Industrie 17, CH-1951 Sion, Switzerland
| | - Andrew S. Rosen
- Department of Materials
Science and Engineering, University of California, Berkeley, California 94720, United States
- Miller Institute for Basic Research in Science, University of California, Berkeley, California 94720, United States
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Aditi S. Krishnapriyan
- Department of Chemical and Biomolecular Engineering, University of California, Berkeley, California 94720, United States
- Department of Electrical Engineering and
Computer Science, University of California, Berkeley, California 94720, United States
- Computational
Research Division, Lawrence Berkeley National
Laboratory, Berkeley, California 94720, United States
| | - Berend Smit
- Laboratory of molecular simulation (LSMO), Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Rue de l’Industrie 17, CH-1951 Sion, Switzerland
| |
Collapse
|
35
|
Lei Z, Ang HT, Wu J. Advanced In-Line Purification Technologies in Multistep Continuous Flow Pharmaceutical Synthesis. Org Process Res Dev 2023. [DOI: 10.1021/acs.oprd.2c00374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/03/2023]
|
36
|
Peng X, Wang X. Next-generation intelligent laboratories for materials design and manufacturing. MRS BULLETIN 2023; 48:179-185. [PMID: 36960275 PMCID: PMC9970134 DOI: 10.1557/s43577-023-00481-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 01/20/2023] [Indexed: 06/18/2023]
Abstract
The contradiction between the importance of materials to modern society and their slow development process has led to the development of multiple methods to accelerate materials discovery. The recently emerged concept of intelligent laboratories integrates the developments in fields of high-throughput experimentation, automation, theoretical computing, and artificial intelligence to form a system that can autonomously carry out designed experiments and make scientific discoveries. We present the basic concepts and the foundations of this new research paradigm, demonstrate its typical application scenarios through case studies, and envision a collaborative human-machine meta laboratory in the future.
Collapse
Affiliation(s)
- Xiting Peng
- Department of Chemical Engineering, Tsinghua University, Beijing, China
| | - Xiaonan Wang
- Department of Chemical Engineering, Tsinghua University, Beijing, China
- Key Laboratory of Industrial Biocatalysis (Tsinghua University), Ministry of Education, Beijing, China
| |
Collapse
|
37
|
Tan B, Zhang J, Xiao C, Liu Y, Yang X, Wang W, Li Y, Liu N. Progress of Artificial Intelligence in Drug Synthesis and Prospect of Its Application in Nitrification of Energetic Materials. Molecules 2023; 28:1900. [PMID: 36838887 PMCID: PMC9963094 DOI: 10.3390/molecules28041900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Revised: 02/12/2023] [Accepted: 02/13/2023] [Indexed: 02/19/2023] Open
Abstract
Artificial intelligence technology shows the advantages of improving efficiency, reducing costs, shortening time, reducing the number of staff on site and achieving precise operations, making impressive research progress in the fields of drug discovery and development, but there are few reports on application in energetic materials. This paper addresses the high safety risks in the current nitrification process of energetic materials, comprehensively analyses and summarizes the main safety risks and their control elements in the nitrification process, proposes possibilities and suggestions for using artificial intelligence technology to enhance the "essential safety" of the nitrification process in energetic materials, reviews the research progress of artificial intelligence in the field of drug synthesis, looks forward to the application prospects of artificial intelligence technology in the nitrification of energetic materials and provides support and guidance for the safe processing of nitrification in the propellants and explosives industry.
Collapse
Affiliation(s)
- Bojun Tan
- Xi’an Modern Chemistry Research Institute, Xi’an 710065, China
| | - Jing Zhang
- Xi’an Modern Chemistry Research Institute, Xi’an 710065, China
| | - Chuan Xiao
- Academy of Ordnance Science, Beijing 100089, China
| | - Yingzhe Liu
- Xi’an Modern Chemistry Research Institute, Xi’an 710065, China
| | - Xiong Yang
- Xi’an Modern Chemistry Research Institute, Xi’an 710065, China
| | - Wei Wang
- Xi’an Modern Chemistry Research Institute, Xi’an 710065, China
| | - Yanan Li
- Xi’an Modern Chemistry Research Institute, Xi’an 710065, China
| | - Ning Liu
- Xi’an Modern Chemistry Research Institute, Xi’an 710065, China
| |
Collapse
|
38
|
A Review on Artificial Intelligence Enabled Design, Synthesis, and Process Optimization of Chemical Products for Industry 4.0. Processes (Basel) 2023. [DOI: 10.3390/pr11020330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
With the development of Industry 4.0, artificial intelligence (AI) is gaining increasing attention for its performance in solving particularly complex problems in industrial chemistry and chemical engineering. Therefore, this review provides an overview of the application of AI techniques, in particular machine learning, in chemical design, synthesis, and process optimization over the past years. In this review, the focus is on the application of AI for structure-function relationship analysis, synthetic route planning, and automated synthesis. Finally, we discuss the challenges and future of AI in making chemical products.
Collapse
|
39
|
Kowalski D, MacGregor CM, Long DL, Bell NL, Cronin L. Automated Library Generation and Serendipity Quantification Enables Diverse Discovery in Coordination Chemistry. J Am Chem Soc 2023; 145:2332-2341. [PMID: 36649125 PMCID: PMC9896557 DOI: 10.1021/jacs.2c11066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Library generation experiments are a key part of the discovery of new materials, methods, and models in chemistry, but the question of how to generate high quality libraries to enable discovery is nontrivial. Herein, we use coordination chemistry to demonstrate the automation of many of the workflows used for library generation in automated hardware including the Chemputer. First, we explore the target-oriented synthesis of three influential coordination complexes, to validate key synthetic operations in our system; second, the generation of focused libraries in chemical and process space; and third, the development of a new workflow for prospecting library formation. This involved Bayesian optimization using a Gaussian process as surrogate model combined with a metric for novelty (or serendipity) quantification based on mass spectrometry data. In this way, we show directed exploration of a process space toward those areas with rarer observations and build a picture of the diversity in product distributions present across the space. We show that this effectively "engineers" serendipity into our search through the unexpected appearance of acetic anhydride, formed in situ, and solvent degradation products as ligands in an isolable series of three Co(III) anhydride complexes.
Collapse
|
40
|
Wen M, Spotte-Smith EWC, Blau SM, McDermott MJ, Krishnapriyan AS, Persson KA. Chemical reaction networks and opportunities for machine learning. NATURE COMPUTATIONAL SCIENCE 2023; 3:12-24. [PMID: 38177958 DOI: 10.1038/s43588-022-00369-z] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 11/08/2022] [Indexed: 01/06/2024]
Abstract
Chemical reaction networks (CRNs), defined by sets of species and possible reactions between them, are widely used to interrogate chemical systems. To capture increasingly complex phenomena, CRNs can be leveraged alongside data-driven methods and machine learning (ML). In this Perspective, we assess the diverse strategies available for CRN construction and analysis in pursuit of a wide range of scientific goals, discuss ML techniques currently being applied to CRNs and outline future CRN-ML approaches, presenting scientific and technical challenges to overcome.
Collapse
Affiliation(s)
- Mingjian Wen
- Chemical and Biomolecular Engineering, University of Houston, Houston, TX, USA
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Evan Walter Clark Spotte-Smith
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Samuel M Blau
- Energy Technologies Area, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Matthew J McDermott
- Materials Science Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA
| | - Aditi S Krishnapriyan
- Computational Research Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
- Chemical and Biomolecular Engineering, University of California, Berkeley, Berkeley, CA, USA
- Electrical Engineering and Computer Science, University of California, Berkeley, Berkeley, CA, USA
| | - Kristin A Persson
- Materials Science and Engineering, University of California, Berkeley, Berkeley, CA, USA.
- Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| |
Collapse
|
41
|
McMillan AE, Wu WWX, Nichols PL, Wanner BM, Bode JW. A vending machine for drug-like molecules - automated synthesis of virtual screening hits. Chem Sci 2022; 13:14292-14299. [PMID: 36545137 PMCID: PMC9749103 DOI: 10.1039/d2sc05182f] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2022] [Accepted: 10/27/2022] [Indexed: 12/24/2022] Open
Abstract
As a result of high false positive rates in virtual screening campaigns, prospective hits must be synthesised for validation. When done manually, this is a time consuming and laborious process. Large "on-demand" virtual libraries (>7 × 1012 members), suitable for preparation using capsule-based automated synthesis and commercial building blocks, were evaluated to determine their structural novelty. One sub-library, constructed from iSnAP capsules, aldehydes and amines, contains unique scaffolds with drug-like physicochemical properties. Virtual screening hits from this iSnAP library were prepared in an automated fashion for evaluation against Aedes aegypti and Phytophthora infestans. In comparison to manual workflows, this approach provided a 10-fold improvement in user efficiency. A streamlined method of relative stereochemical assignment was also devised to augment the rapid synthesis. User efficiency was further improved to 100-fold by downscaling and parallelising capsule-based chemistry on 96-well plates equipped with filter bases. This work demonstrates that automated synthesis consoles can enable the rapid and reliable preparation of attractive virtual screening hits from large virtual libraries.
Collapse
Affiliation(s)
- Angus E. McMillan
- Laboratory for Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH ZürichZürich 8093Switzerland
| | - Wilson W. X. Wu
- Laboratory for Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH ZürichZürich 8093Switzerland
| | - Paula L. Nichols
- Laboratory for Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH ZürichZürich 8093Switzerland,Synple Chem AGKemptpark 18Kemptthal 8310Switzerland
| | | | - Jeffrey W. Bode
- Laboratory for Organic Chemistry, Department of Chemistry and Applied Biosciences, ETH ZürichZürich 8093Switzerland
| |
Collapse
|
42
|
Kaisin G, Bovy L, Joyard Y, Maindron N, Tadino V, Monbaliu JCM. A perspective on automated advanced continuous flow manufacturing units for the upgrading of biobased chemicals toward pharmaceuticals. J Flow Chem 2022; 13:1-15. [PMID: 36467977 PMCID: PMC9707424 DOI: 10.1007/s41981-022-00247-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 11/04/2022] [Indexed: 11/30/2022]
Abstract
Biomass is a renewable, almost infinite reservoir of a large diversity of highly functionalized chemicals. The conversion of biomass toward biobased platform molecules through biorefineries generally still lacks economic viability. Profitability could be enhanced through the development of new market opportunities for these biobased platform chemicals. The fine chemical industry, and more specifically the manufacturing of pharmaceuticals is one of the sectors bearing significant potential for these biobased building blocks to rapidly emerge and make a difference. There are, however, still many challenges to be dealt with before this market can thrive. Continuous flow technology and its integration for the upgrading of biobased platform molecules for the manufacturing of pharmaceuticals is foreseen as a game-changer. This perspective reflects on the main challenges relative to chemical, process, regulatory and supply chain-related burdens still to be addressed. The implementation of integrated continuous flow processes and their automation into modular units will help for tackling with these challenges. Graphical abstract
Collapse
Affiliation(s)
- Geoffroy Kaisin
- SynLock SRL, Rue de la Vieille Sambre 153, B-5190 Jemeppe-sur-Sambre, Belgium
| | - Loïc Bovy
- Center for Integrated Technology and Organic Synthesis, Research Unit MolSys, University of Liège, B-4000 Liège, Sart Tilman, Belgium
| | - Yoann Joyard
- SynLock SRL, Rue de la Vieille Sambre 153, B-5190 Jemeppe-sur-Sambre, Belgium
| | - Nicolas Maindron
- SynLock SRL, Rue de la Vieille Sambre 153, B-5190 Jemeppe-sur-Sambre, Belgium
| | - Vincent Tadino
- SynLock SRL, Rue de la Vieille Sambre 153, B-5190 Jemeppe-sur-Sambre, Belgium
| | - Jean-Christophe M. Monbaliu
- Center for Integrated Technology and Organic Synthesis, Research Unit MolSys, University of Liège, B-4000 Liège, Sart Tilman, Belgium
| |
Collapse
|
43
|
Wang W, Liu Y, Wang Z, Hao G, Song B. The way to AI-controlled synthesis: how far do we need to go? Chem Sci 2022; 13:12604-12615. [PMID: 36519036 PMCID: PMC9645373 DOI: 10.1039/d2sc04419f] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Accepted: 09/26/2022] [Indexed: 09/08/2024] Open
Abstract
Chemical synthesis always plays an irreplaceable role in chemical, materials, and pharmacological fields. Meanwhile, artificial intelligence (AI) is causing a rapid technological revolution in many fields by replacing manual chemical synthesis and has exhibited a much more economical and time-efficient manner. However, the rate-determining step of AI-controlled synthesis systems is rarely mentioned, which makes it difficult to apply them in general laboratories. Here, the history of developing AI-aided synthesis has been overviewed and summarized. We propose that the hardware of AI-controlled synthesis systems should be more adaptive to execute reactions with different phase reagents and under different reaction conditions, and the software of AI-controlled synthesis systems should have richer kinds of reaction prediction modules. An updated system will better address more different kinds of syntheses. Our viewpoint could help scientists advance the revolution that combines AI and synthesis to achieve more progress in complicated systems.
Collapse
Affiliation(s)
- Wei Wang
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University Guiyang 550025 P. R. China
| | - Yingwei Liu
- State Key Laboratory of Public Big Data, Guizhou University Guiyang 550025 P. R. China
| | - Zheng Wang
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University Guiyang 550025 P. R. China
| | - Gefei Hao
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University Guiyang 550025 P. R. China
| | - Baoan Song
- State Key Laboratory Breeding Base of Green Pesticide and Agricultural Bioengineering, Key Laboratory of Green Pesticide and Agricultural Bioengineering, Ministry of Education, Research and Development Center for Fine Chemicals, Guizhou University Guiyang 550025 P. R. China
| |
Collapse
|
44
|
Angello NH, Rathore V, Beker W, Wołos A, Jira ER, Roszak R, Wu TC, Schroeder CM, Aspuru-Guzik A, Grzybowski BA, Burke MD. Closed-loop optimization of general reaction conditions for heteroaryl Suzuki-Miyaura coupling. Science 2022; 378:399-405. [DOI: 10.1126/science.adc8743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
General conditions for organic reactions are important but rare, and efforts to identify them usually consider only narrow regions of chemical space. Discovering more general reaction conditions requires considering vast regions of chemical space derived from a large matrix of substrates crossed with a high-dimensional matrix of reaction conditions, rendering exhaustive experimentation impractical. Here, we report a simple closed-loop workflow that leverages data-guided matrix down-selection, uncertainty-minimizing machine learning, and robotic experimentation to discover general reaction conditions. Application to the challenging and consequential problem of heteroaryl Suzuki-Miyaura cross-coupling identified conditions that double the average yield relative to a widely used benchmark that was previously developed using traditional approaches. This study provides a practical road map for solving multidimensional chemical optimization problems with large search spaces.
Collapse
Affiliation(s)
- Nicholas H. Angello
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Vandana Rathore
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | | | - Agnieszka Wołos
- Allchemy, Inc., Highland, IN, USA
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | - Edward R. Jira
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Rafał Roszak
- Allchemy, Inc., Highland, IN, USA
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
| | - Tony C. Wu
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
| | - Charles M. Schroeder
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Department of Materials Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, Toronto, ON, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, Canada
- Canadian Institute for Advanced Research, Toronto, ON, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, Canada
| | - Bartosz A. Grzybowski
- Allchemy, Inc., Highland, IN, USA
- Institute of Organic Chemistry, Polish Academy of Sciences, Warsaw, Poland
- Center for Soft and Living Matter, Institute for Basic Science, Ulsan, Republic of Korea
- Department of Chemistry, Ulsan Institute of Science and Technology, Ulsan, Republic of Korea
| | - Martin D. Burke
- Department of Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Cancer Center at Illinois, University of Illinois at Urbana-Champaign, Urbana, IL, USA
- Carle Illinois College of Medicine, University of Illinois at Urbana-Champaign, Urbana, IL, USA
| |
Collapse
|
45
|
Shi X, Wang Q, Wang C, Wang R, Zheng L, Qian C, Tang W. An AI-Based Curling Game System for Winter Olympics. RESEARCH 2022; 2022:9805054. [PMID: 36349338 PMCID: PMC9639444 DOI: 10.34133/2022/9805054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 09/26/2022] [Indexed: 12/02/2022]
Abstract
The real-time application of artificial intelligence (AI) technologies in sports is a long-standing challenge owing to large spatial sports field, complexity, and uncertainty of real-world environment, etc. Although some AI-based systems have been applied to sporting events such as tennis, basketball, and football, they are replayed after the game rather than applied in real time. Here, we present an AI-based curling game system, termed CurlingHunter, which can display actual trajectories, predicted trajectories, and house regions of curling during the games via a giant screen in curling stadiums and a live streaming media platform on the internet in real time, so as to assist the game, improve the interest of watching game, help athletes train, etc. We provide a complete description of CurlingHunter' architecture and a thorough evaluation of its performances and demonstrate that CurlingHunter possesses remarkable real-time performance (~9.005 ms), high accuracy (30 ± 3 cm under measurement distance > 20 m), and good stability. CurlingHunter is the first, to the best of our knowledge, real-time system that can assist athletes to compete during the games in the history of sports and has been successfully applied in Winter Olympics and Winter Paralympics. Our work highlights the potential of AI-based systems for real-time applications in sports.
Collapse
Affiliation(s)
- Xuanke Shi
- SenseTime Research, Beijing 100080, China
| | - Quan Wang
- SenseTime Research, Beijing 100080, China
| | - Chao Wang
- SenseTime Research, Beijing 100080, China
| | - Rui Wang
- SenseTime Research, Beijing 100080, China
| | | | - Chen Qian
- SenseTime Research, Beijing 100080, China
| | - Wei Tang
- SenseTime Research, Beijing 100080, China
- State Key Laboratory of Fluid Power and Mechatronic Systems, Zhejiang University, Hangzhou 310027, China
| |
Collapse
|
46
|
Wang J, Shen Z, Liao Y, Yuan Z, Li S, He G, Lan M, Qian X, Zhang K, Li H. Multi-modal chemical information reconstruction from images and texts for exploring the near-drug space. Brief Bioinform 2022; 23:6761958. [PMID: 36252922 PMCID: PMC9677486 DOI: 10.1093/bib/bbac461] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Revised: 09/21/2022] [Accepted: 09/26/2022] [Indexed: 12/14/2022] Open
Abstract
Identification of new chemical compounds with desired structural diversity and biological properties plays an essential role in drug discovery, yet the construction of such a potential space with elements of 'near-drug' properties is still a challenging task. In this work, we proposed a multimodal chemical information reconstruction system to automatically process, extract and align heterogeneous information from the text descriptions and structural images of chemical patents. Our key innovation lies in a heterogeneous data generator that produces cross-modality training data in the form of text descriptions and Markush structure images, from which a two-branch model with image- and text-processing units can then learn to both recognize heterogeneous chemical entities and simultaneously capture their correspondence. In particular, we have collected chemical structures from ChEMBL database and chemical patents from the European Patent Office and the US Patent and Trademark Office using keywords 'A61P, compound, structure' in the years from 2010 to 2020, and generated heterogeneous chemical information datasets with 210K structural images and 7818 annotated text snippets. Based on the reconstructed results and substituent replacement rules, structural libraries of a huge number of near-drug compounds can be generated automatically. In quantitative evaluations, our model can correctly reconstruct 97% of the molecular images into structured format and achieve an F1-score around 97-98% in the recognition of chemical entities, which demonstrated the effectiveness of our model in automatic information extraction from chemical patents, and hopefully transforming them to a user-friendly, structured molecular database enriching the near-drug space to realize the intelligent retrieval technology of chemical knowledge.
Collapse
Affiliation(s)
| | | | - Yichen Liao
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, China
| | - Zhen Yuan
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, China
| | - Shiliang Li
- Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science & Technology, Shanghai 200237, China
| | - Gaoqi He
- School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
| | - Man Lan
- School of Computer Science and Technology, East China Normal University, Shanghai 200062, China
| | - Xuhong Qian
- Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China
| | - Kai Zhang
- Corresponding authors: Kai Zhang, School of Computer Science and Technology, Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China. E-mail: ; Honglin Li, Shanghai Key Laboratory of New Drug Design, East China University of Science & Technology, Shanghai 200237, China. Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China. E-mail:
| | - Honglin Li
- Corresponding authors: Kai Zhang, School of Computer Science and Technology, Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China. E-mail: ; Honglin Li, Shanghai Key Laboratory of New Drug Design, East China University of Science & Technology, Shanghai 200237, China. Innovation Center for AI and Drug Discovery, East China Normal University, Shanghai 200062, China. E-mail:
| |
Collapse
|
47
|
Chemistry-informed molecular graph as reaction descriptor for machine-learned retrosynthesis planning. Proc Natl Acad Sci U S A 2022; 119:e2212711119. [PMID: 36191228 PMCID: PMC9564830 DOI: 10.1073/pnas.2212711119] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Infusing "chemical wisdom" should improve the data-driven approaches that rely exclusively on historical synthetic data for automatic retrosynthesis planning. For this purpose, we designed a chemistry-informed molecular graph (CIMG) to describe chemical reactions. A collection of key information that is most relevant to chemical reactions is integrated in CIMG:NMR chemical shifts as vertex features, bond dissociation energies as edge features, and solvent/catalyst information as global features. For any given compound as a target, a product CIMG is generated and exploited by a graph neural network (GNN) model to choose reaction template(s) leading to this product. A reactant CIMG is then inferred and used in two GNN models to select appropriate catalyst and solvent, respectively. Finally, a fourth GNN model compares the two CIMG descriptors to check the plausibility of the proposed reaction. A reaction vector is obtained for every molecule in training these models. The chemical wisdom of reaction propensity contained in the pretrained reaction vectors is exploited to autocategorize molecules/reactions and to accelerate Monte Carlo tree search (MCTS) for multistep retrosynthesis planning. Full synthetic routes with recommended catalysts/solvents are predicted efficiently using this CIMG-based approach.
Collapse
|
48
|
Jiang Y, Salley D, Sharma A, Keenan G, Mullin M, Cronin L. An artificial intelligence enabled chemical synthesis robot for exploration and optimization of nanomaterials. SCIENCE ADVANCES 2022; 8:eabo2626. [PMID: 36206340 PMCID: PMC9544322 DOI: 10.1126/sciadv.abo2626] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/23/2022] [Accepted: 08/23/2022] [Indexed: 05/19/2023]
Abstract
We present an autonomous chemical synthesis robot for the exploration, discovery, and optimization of nanostructures driven by real-time spectroscopic feedback, theory, and machine learning algorithms that control the reaction conditions and allow the selective templating of reactions. This approach allows the transfer of materials as seeds between cycles of exploration, opening the search space like gene transfer in biology. The open-ended exploration of the seed-mediated multistep synthesis of gold nanoparticles (AuNPs) via in-line ultraviolet-visible characterization led to the discovery of five categories of nanoparticles by only performing ca. 1000 experiments in three hierarchically linked chemical spaces. The platform optimized nanostructures with desired optical properties by combining experiments and extinction spectrum simulations to achieve a yield of up to 95%. The synthetic procedure is outputted in a universal format using the chemical description language (χDL) with analytical data to produce a unique digital signature to enable the reproducibility of the synthesis.
Collapse
Affiliation(s)
- Yibin Jiang
- School of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, UK
| | - Daniel Salley
- School of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, UK
| | - Abhishek Sharma
- School of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, UK
| | - Graham Keenan
- School of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, UK
| | - Margaret Mullin
- Glasgow Imaging Facility, Institute of Infection Immunity and Inflammation, College of Medical Veterinary and Life Sciences, University of Glasgow, University Avenue, Glasgow G12 8QQ, UK
| | - Leroy Cronin
- School of Chemistry, University of Glasgow, University Avenue, Glasgow G12 8QQ, UK
- Corresponding author.
| |
Collapse
|
49
|
Zhu Q, Zhang F, Huang Y, Xiao H, Zhao L, Zhang X, Song T, Tang X, Li X, He G, Chong B, Zhou J, Zhang Y, Zhang B, Cao J, Luo M, Wang S, Ye G, Zhang W, Chen X, Cong S, Zhou D, Li H, Li J, Zou G, Shang W, Jiang J, Luo Y. An all-round AI-Chemist with a scientific mind. Natl Sci Rev 2022; 9:nwac190. [PMID: 36415316 PMCID: PMC9674120 DOI: 10.1093/nsr/nwac190] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Revised: 08/25/2022] [Accepted: 08/29/2022] [Indexed: 12/03/2022] Open
Abstract
The realization of automated chemical experiments by robots unveiled the prelude to an artificial intelligence (AI) laboratory. Several AI-based systems or robots with specific chemical skills have been demonstrated, but conducting all-round scientific research remains challenging. Here, we present an all-round AI-Chemist equipped with scientific data intelligence that is capable of performing basic tasks generally required in chemical research. Based on a service platform, the AI-Chemist is able to automatically read the literatures from a cloud database and propose experimental plans accordingly. It can control a mobile robot in-house or online to automatically execute the complete experimental process on 14 workstations, including synthesis, characterization and performance tests. The experimental data can be simultaneously analysed by the computational brain of the AI-Chemist through machine learning and Bayesian optimization, allowing a new hypothesis for the next iteration to be proposed. The competence of the AI-Chemist has been scrutinized by three different chemical tasks. In the future, the more advanced all-round AI-Chemists equipped with scientific data intelligence may cause changes to the landscape of the chemical laboratory.
Collapse
Affiliation(s)
- Qing Zhu
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Fei Zhang
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Yan Huang
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Hengyu Xiao
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - LuYuan Zhao
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - XuChun Zhang
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Tao Song
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - XinSheng Tang
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Xiang Li
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Guo He
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - BaoChen Chong
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - JunYi Zhou
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - YiHan Zhang
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Baicheng Zhang
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - JiaQi Cao
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Man Luo
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Song Wang
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - GuiLin Ye
- Hefei JiShu Quantum Technology Co. Ltd, Hefei 230026, China
| | - WanJun Zhang
- Hefei JiShu Quantum Technology Co. Ltd, Hefei 230026, China
| | - Xin Chen
- Hefei JiShu Quantum Technology Co. Ltd, Hefei 230026, China
| | - Shuang Cong
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Donglai Zhou
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Huirong Li
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Jialei Li
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - Gang Zou
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
| | - WeiWei Shang
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Jun Jiang
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- Hefei National Laboratory, University of Science and Technology of China, Hefei 230088, China
| | - Yi Luo
- Hefei National Research Center for Physical Sciences at the Microscale, School of Chemistry and Materials Science, University of Science and Technology of China, Hefei 230026, China
- Hefei National Laboratory, University of Science and Technology of China, Hefei 230088, China
| |
Collapse
|
50
|
Seifrid M, Pollice R, Aguilar-Granda A, Morgan Chan Z, Hotta K, Ser CT, Vestfrid J, Wu TC, Aspuru-Guzik A. Autonomous Chemical Experiments: Challenges and Perspectives on Establishing a Self-Driving Lab. Acc Chem Res 2022; 55:2454-2466. [PMID: 35948428 PMCID: PMC9454899 DOI: 10.1021/acs.accounts.2c00220] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Indexed: 01/19/2023]
Abstract
We must accelerate the pace at which we make technological advancements to address climate change and disease risks worldwide. This swifter pace of discovery requires faster research and development cycles enabled by better integration between hypothesis generation, design, experimentation, and data analysis. Typical research cycles take months to years. However, data-driven automated laboratories, or self-driving laboratories, can significantly accelerate molecular and materials discovery. Recently, substantial advancements have been made in the areas of machine learning and optimization algorithms that have allowed researchers to extract valuable knowledge from multidimensional data sets. Machine learning models can be trained on large data sets from the literature or databases, but their performance can often be hampered by a lack of negative results or metadata. In contrast, data generated by self-driving laboratories can be information-rich, containing precise details of the experimental conditions and metadata. Consequently, much larger amounts of high-quality data are gathered in self-driving laboratories. When placed in open repositories, this data can be used by the research community to reproduce experiments, for more in-depth analysis, or as the basis for further investigation. Accordingly, high-quality open data sets will increase the accessibility and reproducibility of science, which is sorely needed.In this Account, we describe our efforts to build a self-driving lab for the development of a new class of materials: organic semiconductor lasers (OSLs). Since they have only recently been demonstrated, little is known about the molecular and material design rules for thin-film, electrically-pumped OSL devices as compared to other technologies such as organic light-emitting diodes or organic photovoltaics. To realize high-performing OSL materials, we are developing a flexible system for automated synthesis via iterative Suzuki-Miyaura cross-coupling reactions. This automated synthesis platform is directly coupled to the analysis and purification capabilities. Subsequently, the molecules of interest can be transferred to an optical characterization setup. We are currently limited to optical measurements of the OSL molecules in solution. However, material properties are ultimately most important in the solid state (e.g., as a thin-film device). To that end and for a different scientific goal, we are developing a self-driving lab for inorganic thin-film materials focused on the oxygen evolution reaction.While the future of self-driving laboratories is very promising, numerous challenges still need to be overcome. These challenges can be split into cognition and motor function. Generally, the cognitive challenges are related to optimization with constraints or unexpected outcomes for which general algorithmic solutions have yet to be developed. A more practical challenge that could be resolved in the near future is that of software control and integration because few instrument manufacturers design their products with self-driving laboratories in mind. Challenges in motor function are largely related to handling heterogeneous systems, such as dispensing solids or performing extractions. As a result, it is critical to understand that adapting experimental procedures that were designed for human experimenters is not as simple as transferring those same actions to an automated system, and there may be more efficient ways to achieve the same goal in an automated fashion. Accordingly, for self-driving laboratories, we need to carefully rethink the translation of manual experimental protocols.
Collapse
Affiliation(s)
- Martin Seifrid
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Robert Pollice
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | | | - Zamyla Morgan Chan
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Acceleration
Consortium, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Kazuhiro Hotta
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Science
& Innovation Center, Mitsubishi Chemical
Corporation, 1000 Kamoshidacho, Aoba, Yokohama, Kanagawa 227-8502, Japan
| | - Cher Tian Ser
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Jenya Vestfrid
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Tony C. Wu
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
| | - Alán Aspuru-Guzik
- Department
of Chemistry, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Department
of Computer Science, University of Toronto, Toronto, Ontario M5S 3H6, Canada
- Department
of Chemical Engineering & Applied Chemistry, University of Toronto, Toronto, Ontario M5S 3E5, Canada
- Department
of Materials Science, University of Toronto, Toronto, Ontario M5S 3E4, Canada
- Vector
Institute for Artificial Intelligence, Toronto, Ontario M5S 1M1, Canada
- Lebovic
Fellow, Canadian Institute for Advanced
Research, Toronto, Ontario M5S 1M1, Canada
| |
Collapse
|