1
|
Sletten ET, Wolf JB, Danglad‐Flores J, Seeberger PH. Carbohydrate Synthesis is Entering the Data-Driven Digital Era. Chemistry 2025; 31:e202500289. [PMID: 40178205 PMCID: PMC12080308 DOI: 10.1002/chem.202500289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2025] [Revised: 03/27/2025] [Accepted: 03/28/2025] [Indexed: 04/05/2025]
Abstract
Glycans are vital in biological processes, but their nontemplated, heterogeneous structures complicate structure-function analyses. Glycosylation, the key reaction in synthetic glycochemistry, remains not entirely predictable due to its complex mechanism and the need for protecting groups that impact reaction outcomes. This concept highlights recent advancements in glycochemistry and emphasizes the integration of digital tools, including automation, computational modelling, and data management, to improve carbohydrate synthesis and support further progress in the field.
Collapse
Affiliation(s)
- Eric T. Sletten
- Max Planck Institute of Colloids and InterfacesPotsdam Science ParkAm Mühlenberg 114476PotsdamGermany
| | - Jakob B. Wolf
- Max Planck Institute of Colloids and InterfacesPotsdam Science ParkAm Mühlenberg 114476PotsdamGermany
- Institut für Chemie, Biochemie und PharmazieFreie Universität BerlinTakusstraße 314195BerlinGermany
| | - José Danglad‐Flores
- Max Planck Institute of Colloids and InterfacesPotsdam Science ParkAm Mühlenberg 114476PotsdamGermany
| | - Peter H. Seeberger
- Max Planck Institute of Colloids and InterfacesPotsdam Science ParkAm Mühlenberg 114476PotsdamGermany
- Institut für Chemie, Biochemie und PharmazieFreie Universität BerlinTakusstraße 314195BerlinGermany
| |
Collapse
|
2
|
Kim S, Schrier J, Jung Y. Explainable Synthesizability Prediction of Inorganic Crystal Polymorphs Using Large Language Models. Angew Chem Int Ed Engl 2025; 64:e202423950. [PMID: 39943898 PMCID: PMC12051750 DOI: 10.1002/anie.202423950] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2024] [Revised: 01/24/2025] [Accepted: 02/12/2025] [Indexed: 03/23/2025]
Abstract
We evaluate the ability of machine learning to predict whether a hypothetical crystal structure can be synthesized and explain those predictions to scientists. Fine-tuned large language models (LLMs) trained on a human-readable text description of the target crystal structure perform comparably to previous bespoke convolutional graph neural network methods, but better prediction quality can be achieved by training a positive-unlabeled learning model on a text-embedding representation of the structure. An LLM-based workflow can then be used to generate human-readable explanations for the types of factors governing synthesizability, extract the underlying physical rules, and assess the veracity of those rules. These explanations can guide chemists in modifying or optimizing non-synthesizable hypothetical structures to make them more feasible for materials design.
Collapse
Affiliation(s)
- Seongmin Kim
- Department of Chemical and Biological Engineering (BK21 four)Seoul National University1 Gwanak‐roGwanak‐guSeoul08826South Korea
- Department of Chemical and Biomolecular EngineeringKorea Advanced Institute of Science and Technology (KAIST)291, Daehak‐roYuseong‐guDaejeon34141South Korea
| | - Joshua Schrier
- Department of Chemistry and BiochemistryFordham University441 E. Fordham RoadThe BronxNew York10458USA
| | - Yousung Jung
- Department of Chemical and Biological Engineering (BK21 four)Seoul National University1 Gwanak‐roGwanak‐guSeoul08826South Korea
- Institute of Chemical ProcessesSeoul National University1 Gwanak‐roGwanak‐guSeoul08826South Korea
- Institute of Engineering ResearchSeoul National University1 Gwanak‐roGwanak‐guSeoul08826South Korea
| |
Collapse
|
3
|
Ekosso C, Liu H, Glagovich A, Nguyen D, Maurer S, Schrier J. Accelerating the Discovery of Abiotic Vesicles with AI-Guided Automated Experimentation. LANGMUIR : THE ACS JOURNAL OF SURFACES AND COLLOIDS 2025; 41:858-867. [PMID: 39810357 DOI: 10.1021/acs.langmuir.4c04181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2025]
Abstract
The first protocells are speculated to have arisen from the self-assembly of simple abiotic carboxylic acids, alcohols, and other amphiphiles into vesicles. To study the complex process of vesicle formation, we combined laboratory automation with AI-guided experimentation to accelerate the discovery of specific compositions and underlying principles governing vesicle formation. Using a low-cost commercial liquid handling robot, we automated experimental procedures, enabling high-throughput testing of various reaction conditions for mixtures of seven (7) amphiphiles. Multitemplate multiscale template matching (MMTM) was used to automate confocal microscopy image analysis, enabling us to quantify vesicle formation without tedious manual counting. The results were used to create a Gaussian process surrogate model, and then active learning was used to iteratively direct the laboratory experiments to reduce model uncertainty. Mixtures containing primarily trimethyl decylammonium and decylsulfate in equal amounts formed vesicles at submillimolar critical vesicle concentrations, and more than 20% glycerol monodecanoate prevented vesicles from forming even at high total amphiphile concentrations.
Collapse
Affiliation(s)
- Christelle Ekosso
- Department of Chemistry and Biochemistry, Central Connecticut State University, 1615 Stanley Street, New Britain, Connecticut 06050, United States
| | - Hao Liu
- Department of Chemistry and Biochemistry, Fordham University, 441 East Fordham Road, The Bronx, New York 10458, United States
| | - Avery Glagovich
- Department of Chemistry and Biochemistry, Central Connecticut State University, 1615 Stanley Street, New Britain, Connecticut 06050, United States
| | - Dustin Nguyen
- Department of Chemistry and Biochemistry, Fordham University, 441 East Fordham Road, The Bronx, New York 10458, United States
| | - Sarah Maurer
- Department of Chemistry and Biochemistry, Central Connecticut State University, 1615 Stanley Street, New Britain, Connecticut 06050, United States
| | - Joshua Schrier
- Department of Chemistry and Biochemistry, Fordham University, 441 East Fordham Road, The Bronx, New York 10458, United States
| |
Collapse
|
4
|
Liu Y, Ping M, Han J, Cheng X, Qin H, Wang W. Neural Network Methods in the Development of MEMS Sensors. MICROMACHINES 2024; 15:1368. [PMID: 39597178 PMCID: PMC11596212 DOI: 10.3390/mi15111368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Revised: 11/06/2024] [Accepted: 11/11/2024] [Indexed: 11/29/2024]
Abstract
As a kind of long-term favorable device, the microelectromechanical system (MEMS) sensor has become a powerful dominator in the detection applications of commercial and industrial areas. There have been a series of mature solutions to address the possible issues in device design, optimization, fabrication, and output processing. The recent involvement of neural networks (NNs) has provided a new paradigm for the development of MEMS sensors and greatly accelerated the research cycle of high-performance devices. In this paper, we present an overview of the progress, applications, and prospects of NN methods in the development of MEMS sensors. The superiority of leveraging NN methods in structural design, device fabrication, and output compensation/calibration is reviewed and discussed to illustrate how NNs have reformed the development of MEMS sensors. Relevant issues in the usage of NNs, such as available models, dataset construction, and parameter optimization, are presented. Many application scenarios have demonstrated that NN methods can enhance the speed of predicting device performance, rapidly generate device-on-demand solutions, and establish more accurate calibration and compensation models. Along with the improvement in research efficiency, there are also several critical challenges that need further exploration in this area.
Collapse
Affiliation(s)
- Yan Liu
- School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, China; (M.P.); (J.H.); (X.C.); (H.Q.)
| | - Mingda Ping
- School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, China; (M.P.); (J.H.); (X.C.); (H.Q.)
| | - Jizhou Han
- School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, China; (M.P.); (J.H.); (X.C.); (H.Q.)
| | - Xiang Cheng
- School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, China; (M.P.); (J.H.); (X.C.); (H.Q.)
| | - Hongbo Qin
- School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, China; (M.P.); (J.H.); (X.C.); (H.Q.)
| | - Weidong Wang
- School of Mechano-Electronic Engineering, Xidian University, Xi’an 710071, China; (M.P.); (J.H.); (X.C.); (H.Q.)
- CityU-Xidian Joint Laboratory of Micro/Nano Manufacturing, Shenzhen 518057, China
| |
Collapse
|
5
|
Kim S, Jung Y, Schrier J. Large Language Models for Inorganic Synthesis Predictions. J Am Chem Soc 2024; 146:19654-19659. [PMID: 38991051 DOI: 10.1021/jacs.4c05840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/13/2024]
Abstract
We evaluate the effectiveness of pretrained and fine-tuned large language models (LLMs) for predicting the synthesizability of inorganic compounds and the selection of precursors needed to perform inorganic synthesis. The predictions of fine-tuned LLMs are comparable to─and sometimes better than─recent bespoke machine learning models for these tasks but require only minimal user expertise, cost, and time to develop. Therefore, this strategy can serve both as an effective and strong baseline for future machine learning studies of various chemical applications and as a practical tool for experimental chemists.
Collapse
Affiliation(s)
- Seongmin Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291, Daehak-ro, Yuseong-gu, Daejeon 34141, Korea
| | - Yousung Jung
- Department of Chemical and Biological Engineering (BK21 four), Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Korea
- Institute of Chemical Processes, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Korea
- Interdisciplinary Program in Artificial Intelligence, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Korea
- Institute of Engineering Research, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul 08826, Korea
| | - Joshua Schrier
- Department of Chemistry and Biochemistry, Fordham University, 441 East Fordham Road, The Bronx, New York 10458, United States
| |
Collapse
|
6
|
Kim MA, Ai Q, Norquist AJ, Schrier J, Chan EM. Active Learning of Ligands That Enhance Perovskite Nanocrystal Luminescence. ACS NANO 2024; 18:14514-14522. [PMID: 38776469 DOI: 10.1021/acsnano.4c02094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2024]
Abstract
Ligands play a critical role in the optical properties and chemical stability of colloidal nanocrystals (NCs), but identifying ligands that can enhance NC properties is daunting, given the high dimensionality of chemical space. Here, we use machine learning (ML) and robotic screening to accelerate the discovery of ligands that enhance the photoluminescence quantum yield (PLQY) of CsPbBr3 perovskite NCs. We developed a ML model designed to predict the relative PL enhancement of perovskite NCs when coordinated with a ligand selected from a pool of 29,904 candidate molecules. Ligand candidates were selected using an active learning (AL) approach that accounted for uncertainty quantified by twin regressors. After eight experimental iterations of batch AL (corresponding to 21 initial and 72 model-recommended ligands), the uncertainty of the model decreased, demonstrating an increased confidence in the model predictions. Feature importance and counterfactual analyses of model predictions illustrate the potential use of ligand field strength in designing PL-enhancing ligands. Our versatile AL framework can be readily adapted to screen the effect of ligands on a wide range of colloidal nanomaterials.
Collapse
Affiliation(s)
- Min A Kim
- The Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Qianxiang Ai
- Department of Chemistry and Biochemistry, Fordham University, 441 E. Fordham Rd, The Bronx, New York 10458, United States
| | - Alexander J Norquist
- Department of Chemistry, Haverford College, 370 Lancaster Ave, Haverford, Pennsylvania 19041, United States
| | - Joshua Schrier
- Department of Chemistry and Biochemistry, Fordham University, 441 E. Fordham Rd, The Bronx, New York 10458, United States
| | - Emory M Chan
- The Molecular Foundry, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| |
Collapse
|
7
|
Chen LX, Yano J. Deciphering Photoinduced Catalytic Reaction Mechanisms in Natural and Artificial Photosynthetic Systems on Multiple Temporal and Spatial Scales Using X-ray Probes. Chem Rev 2024; 124:5421-5469. [PMID: 38663009 DOI: 10.1021/acs.chemrev.3c00560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2024]
Abstract
Utilization of renewable energies for catalytically generating value-added chemicals is highly desirable in this era of rising energy demands and climate change impacts. Artificial photosynthetic systems or photocatalysts utilize light to convert abundant CO2, H2O, and O2 to fuels, such as carbohydrates and hydrogen, thus converting light energy to storable chemical resources. The emergence of intense X-ray pulses from synchrotrons, ultrafast X-ray pulses from X-ray free electron lasers, and table-top laser-driven sources over the past decades opens new frontiers in deciphering photoinduced catalytic reaction mechanisms on the multiple temporal and spatial scales. Operando X-ray spectroscopic methods offer a new set of electronic transitions in probing the oxidation states, coordinating geometry, and spin states of the metal catalytic center and photosensitizers with unprecedented energy and time resolution. Operando X-ray scattering methods enable previously elusive reaction steps to be characterized on different length scales and time scales. The methodological progress and their application examples collected in this review will offer a glimpse into the accomplishments and current state in deciphering reaction mechanisms for both natural and synthetic systems. Looking forward, there are still many challenges and opportunities at the frontier of catalytic research that will require further advancement of the characterization techniques.
Collapse
Affiliation(s)
- Lin X Chen
- Chemical Science and Engineering Division, Argonne National Laboratory, Lemont, Illinois 60439, United States
- Department of Chemistry, Northwestern University, Evanston, Illinois 60208, United States
| | - Junko Yano
- Molecular Biophysics & Integrated Bioimaging Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| |
Collapse
|
8
|
Duke R, McCoy R, Risko C, Bursten JRS. Promises and Perils of Big Data: Philosophical Constraints on Chemical Ontologies. J Am Chem Soc 2024; 146:11579-11591. [PMID: 38640489 DOI: 10.1021/jacs.3c11399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2024]
Abstract
Chemistry is experiencing a paradigm shift in the way it interacts with data. So-called "big data" are collected and used at unprecedented scales with the idea that algorithms can be designed to aid in chemical discovery. As data-enabled practices become ever more ubiquitous, chemists must consider the organization and curation of their data, especially as it is presented to both humans and increasingly intelligent algorithms. One of the most promising organizational schemes for big data is a construct termed an ontology. In data science, ontologies are systems that represent relations among objects and properties in a domain of discourse. As chemistry encounters larger and larger data sets, the ontologies that support chemical research will likewise increase in complexity, and the future of chemistry will be shaped by the choices made in developing big data chemical ontologies. How such ontologies will work should therefore be a subject of significant attention in the chemical community. Now is the time for chemists to ask questions about ontology design and use: How should chemical data be organized? What can be reasonably expected from an organizational structure? Is a universal ontology tenable? As some of these questions may be new to chemists, we recommend an interdisciplinary approach that draws on the long history of philosophers of science asking questions about the organization of scientific concepts, constructs, models, and theories. This Perspective presents insights from these long-standing studies and initiates new conversations between chemists and philosophers.
Collapse
Affiliation(s)
- Rebekah Duke
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Ryan McCoy
- Department of Philosophy, University of Kentucky, Lexington, Kentucky 40508, United States
| | - Chad Risko
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky, Lexington, Kentucky 40506, United States
| | - Julia R S Bursten
- Department of Philosophy, University of Kentucky, Lexington, Kentucky 40508, United States
| |
Collapse
|
9
|
Grazioli G, Tao A, Bhatia I, Regan P. Genetic Algorithm for Automated Parameterization of Network Hamiltonian Models of Amyloid Fibril Formation. J Phys Chem B 2024; 128:1854-1865. [PMID: 38359362 PMCID: PMC10910512 DOI: 10.1021/acs.jpcb.3c07322] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2023] [Revised: 01/07/2024] [Accepted: 02/05/2024] [Indexed: 02/17/2024]
Abstract
The time scales of long-time atomistic molecular dynamics simulations are typically reported in microseconds, while the time scales for experiments studying the kinetics of amyloid fibril formation are typically reported in minutes or hours. This time scale deficit of roughly 9 orders of magnitude presents a major challenge in the design of computer simulation methods for studying protein aggregation events. Coarse-grained molecular simulations offer a computationally tractable path forward for exploring the molecular mechanism driving the formation of these structures, which are implicated in diseases such as Alzheimer's, Parkinson's, and type-II diabetes. Network Hamiltonian models of aggregation are centered around a Hamiltonian function that returns the total energy of a system of aggregating proteins, given the graph structure of the system as an input. In the graph, or network, representation of the system, each protein molecule is represented as a node, and noncovalent bonds between proteins are represented as edges. The parameter, i.e., a set of coefficients that determine the degree to which each topological degree of freedom is favored or disfavored, must be determined for each network Hamiltonian model, and is a well-known technical challenge. The methodology is first demonstrated by beginning with an initial set of randomly parametrized models of low fibril fraction (<5% fibrillar), and evolving to subsequent generations of models, ultimately leading to high fibril fraction models (>70% fibrillar). The methodology is also demonstrated by applying it to optimizing previously published network Hamiltonian models for the 5 key amyloid fibril topologies that have been reported in the Protein Data Bank (PDB). The models generated by the AI produced fibril fractions that surpass previously published fibril fractions in 3 of 5 cases, including the most naturally abundant amyloid fibril topology, the 1,2 2-ribbon, which features a steric zipper. The authors also aim to encourage more widespread use of the network Hamiltonian methodology for fitting a wide variety of self-assembling systems by releasing a free open-source implementation of the genetic algorithm introduced here.
Collapse
Affiliation(s)
- Gianmarc Grazioli
- Department of Chemistry, San
José State University, San Jose, California 95192, United States
| | - Andy Tao
- Department of Chemistry, San
José State University, San Jose, California 95192, United States
| | - Inika Bhatia
- Department of Chemistry, San
José State University, San Jose, California 95192, United States
| | - Patrick Regan
- Department of Chemistry, San
José State University, San Jose, California 95192, United States
| |
Collapse
|
10
|
Mayo Yanes E, Chakraborty S, Gershoni-Poranne R. COMPAS-2: a dataset of cata-condensed hetero-polycyclic aromatic systems. Sci Data 2024; 11:97. [PMID: 38242917 PMCID: PMC10799083 DOI: 10.1038/s41597-024-02927-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2023] [Accepted: 01/05/2024] [Indexed: 01/21/2024] Open
Abstract
Polycyclic aromatic systems are highly important to numerous applications, in particular to organic electronics and optoelectronics. High-throughput screening and generative models that can help to identify new molecules to advance these technologies require large amounts of high-quality data, which is expensive to generate. In this report, we present the largest freely available dataset of geometries and properties of cata-condensed poly(hetero)cyclic aromatic molecules calculated to date. Our dataset contains ~500k molecules comprising 11 types of aromatic and antiaromatic building blocks calculated at the GFN1-xTB level and is representative of a highly diverse chemical space. We detail the structure enumeration process and the methods used to provide various electronic properties (including HOMO-LUMO gap, adiabatic ionization potential, and adiabatic electron affinity). Additionally, we benchmark against a ~50k dataset calculated at the CAM-B3LYP-D3BJ/def2-SVP level and develop a fitting scheme to correct the xTB values to higher accuracy. These new datasets represent the second installment in the COMputational database of Polycyclic Aromatic Systems (COMPAS) Project.
Collapse
Affiliation(s)
- Eduardo Mayo Yanes
- Schulich Faculty of Chemistry, Technion - Israel Institute of Technology, Haifa, 32000, Israel
| | - Sabyasachi Chakraborty
- Schulich Faculty of Chemistry, Technion - Israel Institute of Technology, Haifa, 32000, Israel
| | - Renana Gershoni-Poranne
- Schulich Faculty of Chemistry, Technion - Israel Institute of Technology, Haifa, 32000, Israel.
| |
Collapse
|
11
|
Back S, Aspuru-Guzik A, Ceriotti M, Gryn'ova G, Grzybowski B, Gu GH, Hein J, Hippalgaonkar K, Hormázabal R, Jung Y, Kim S, Kim WY, Moosavi SM, Noh J, Park C, Schrier J, Schwaller P, Tsuda K, Vegge T, von Lilienfeld OA, Walsh A. Accelerated chemical science with AI. DIGITAL DISCOVERY 2024; 3:23-33. [PMID: 38239898 PMCID: PMC10793638 DOI: 10.1039/d3dd00213f] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Accepted: 12/06/2023] [Indexed: 01/22/2024]
Abstract
In light of the pressing need for practical materials and molecular solutions to renewable energy and health problems, to name just two examples, one wonders how to accelerate research and development in the chemical sciences, so as to address the time it takes to bring materials from initial discovery to commercialization. Artificial intelligence (AI)-based techniques, in particular, are having a transformative and accelerating impact on many if not most, technological domains. To shed light on these questions, the authors and participants gathered in person for the ASLLA Symposium on the theme of 'Accelerated Chemical Science with AI' at Gangneung, Republic of Korea. We present the findings, ideas, comments, and often contentious opinions expressed during four panel discussions related to the respective general topics: 'Data', 'New applications', 'Machine learning algorithms', and 'Education'. All discussions were recorded, transcribed into text using Open AI's Whisper, and summarized using LG AI Research's EXAONE LLM, followed by revision by all authors. For the broader benefit of current researchers, educators in higher education, and academic bodies such as associations, publishers, librarians, and companies, we provide chemistry-specific recommendations and summarize the resulting conclusions.
Collapse
Affiliation(s)
- Seoin Back
- Department of Chemical and Biomolecular Engineering, Institute of Emergent Materials, Sogang University Seoul Republic of Korea
| | - Alán Aspuru-Guzik
- Departments of Chemistry, Computer Science, University of Toronto St. George Campus Toronto ON Canada
- Acceleration Consortium and Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
| | - Michele Ceriotti
- Laboratory of Computational Science and Modeling (COSMO), École Polytechnique Fédérale de Lausanne Lausanne Switzerland
| | - Ganna Gryn'ova
- Heidelberg Institute for Theoretical Studies (HITS gGmbH) 69118 Heidelberg Germany
- Interdisciplinary Center for Scientific Computing, Heidelberg University 69120 Heidelberg Germany
| | - Bartosz Grzybowski
- Center for Algorithmic and Robotized Synthesis (CARS), Institute for Basic Science (IBS) Ulsan Republic of Korea
- Institute of Organic Chemistry, Polish Academy of Sciences Warsaw Poland
- Department of Chemistry, Ulsan National Institute of Science and Technology Ulsan Republic of Korea
| | - Geun Ho Gu
- Department of Energy Engineering, Korea Institute of Energy Technology (KENTECH) Naju 58330 Republic of Korea
| | - Jason Hein
- Department of Chemistry, University of British Columbia Vancouver BC V6T 1Z1 Canada
| | - Kedar Hippalgaonkar
- School of Materials Science and Engineering, Nanyang Technological University 50 Nanyang Avenue Singapore 639798 Singapore
- Institute of Materials Research and Engineering, Agency for Science Technology and Research 2 Fusionopolis Way, 08-03 Singapore 138634 Singapore
| | | | - Yousung Jung
- Department of Chemical and Biomolecular Engineering, KAIST Daejeon Republic of Korea
- School of Chemical and Biological Engineering, Interdisciplinary Program in Artificial Intelligence, Seoul National University 1 Gwanak-ro, Gwanak-gu Seoul 08826 Republic of Korea
| | - Seonah Kim
- Department of Chemistry, Colorado State University 1301 Center Avenue Fort Collins CO 80523 USA
| | - Woo Youn Kim
- Department of Chemistry, KAIST Daejeon Republic of Korea
| | - Seyed Mohamad Moosavi
- Chemical Engineering & Applied Chemistry, University of Toronto Toronto Ontario M5S 3E5 Canada
| | - Juhwan Noh
- Chemical Data-Driven Research Center, Korea Research Institute of Chemical Technology Daejeon 34114 Republic of Korea
| | | | - Joshua Schrier
- Department of Chemistry, Fordham University The Bronx NY 10458 USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence (LIAC) & National Centre of Competence in Research (NCCR) Catalysis, École Polytechnique Fédérale de Lausanne Lausanne Switzerland
| | - Koji Tsuda
- Graduate School of Frontier Sciences, The University of Tokyo Kashiwa Chiba 277-8561 Japan
- Center for Basic Research on Materials, National Institute for Materials Science Tsukuba Ibaraki 305-0044 Japan
- RIKEN Center for Advanced Intelligence Project Tokyo 103-0027 Japan
| | - Tejs Vegge
- Department of Energy Conversion and Storage, Technical University of Denmark 301 Anker Engelunds vej, Kongens Lyngby Copenhagen 2800 Denmark
| | - O Anatole von Lilienfeld
- Acceleration Consortium and Vector Institute for Artificial Intelligence Toronto ON M5S 1M1 Canada
- Departments of Chemistry, Materials Science and Engineering, and Physics, University of Toronto, St George Campus Toronto ON Canada
- Machine Learning Group, Technische Universität Berlin and Berlin Institute for the Foundations of Learning and Data 10587 Berlin Germany
| | - Aron Walsh
- Department of Materials, Imperial College London London SW7 2AZ UK
- Department of Physics, Ewha Women's University Seoul Republic of Korea
| |
Collapse
|
12
|
Bain M, Godínez Castellanos JL, Bradforth SE. High-Throughput Screening for Ultrafast Photochemical Reaction Discovery. J Phys Chem Lett 2023; 14:9864-9871. [PMID: 37890453 DOI: 10.1021/acs.jpclett.3c02389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/29/2023]
Abstract
High-repetition-rate lasers present an opportunity to extend ultrafast spectroscopy from a detailed probe of singular model photochemical systems to a routine analysis technique in training machine learning models to aid the design cycle of photochemical syntheses. We bring together innovations in line scan cameras and micro-electro-mechanical grating modulators with sample delivery via high-pressure liquid chromatography pumps to demonstrate a transient absorption spectrometer that can characterize photoreactions initiated with ultrashort ultraviolet pulses in a time scale of minutes. Furthermore, we demonstrate that the ability to rapidly screen an important class of photochemical system, pyrimidine nucleosides, can be used to explore the effect of conformational modification on the evolution of excited-state processes.
Collapse
Affiliation(s)
- Matthew Bain
- Department of Chemistry, University of Southern California, Los Angeles, California 90089-0482, United States
| | - José L Godínez Castellanos
- Department of Chemistry, University of Southern California, Los Angeles, California 90089-0482, United States
| | - Stephen E Bradforth
- Department of Chemistry, University of Southern California, Los Angeles, California 90089-0482, United States
| |
Collapse
|
13
|
Schrier J, Norquist AJ, Buonassisi T, Brgoch J. In Pursuit of the Exceptional: Research Directions for Machine Learning in Chemical and Materials Science. J Am Chem Soc 2023; 145:21699-21716. [PMID: 37754929 DOI: 10.1021/jacs.3c04783] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/28/2023]
Abstract
Exceptional molecules and materials with one or more extraordinary properties are both technologically valuable and fundamentally interesting, because they often involve new physical phenomena or new compositions that defy expectations. Historically, exceptionality has been achieved through serendipity, but recently, machine learning (ML) and automated experimentation have been widely proposed to accelerate target identification and synthesis planning. In this Perspective, we argue that the data-driven methods commonly used today are well-suited for optimization but not for the realization of new exceptional materials or molecules. Finding such outliers should be possible using ML, but only by shifting away from using traditional ML approaches that tweak the composition, crystal structure, or reaction pathway. We highlight case studies of high-Tc oxide superconductors and superhard materials to demonstrate the challenges of ML-guided discovery and discuss the limitations of automation for this task. We then provide six recommendations for the development of ML methods capable of exceptional materials discovery: (i) Avoid the tyranny of the middle and focus on extrema; (ii) When data are limited, qualitative predictions that provide direction are more valuable than interpolative accuracy; (iii) Sample what can be made and how to make it and defer optimization; (iv) Create room (and look) for the unexpected while pursuing your goal; (v) Try to fill-in-the-blanks of input and output space; (vi) Do not confuse human understanding with model interpretability. We conclude with a description of how these recommendations can be integrated into automated discovery workflows, which should enable the discovery of exceptional molecules and materials.
Collapse
Affiliation(s)
- Joshua Schrier
- Department of Chemistry, Fordham University, The Bronx, New York 10458, United States
| | - Alexander J Norquist
- Department of Chemistry, Haverford College, Haverford, Pennsylvania 19041, United States
| | - Tonio Buonassisi
- Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Jakoah Brgoch
- Department of Chemistry and Texas Center for Superconductivity, University of Houston, Houston, Texas 77204, United States
| |
Collapse
|
14
|
Mahjour B, Zhang R, Shen Y, McGrath A, Zhao R, Mohamed OG, Lin Y, Zhang Z, Douthwaite JL, Tripathi A, Cernak T. Rapid planning and analysis of high-throughput experiment arrays for reaction discovery. Nat Commun 2023; 14:3924. [PMID: 37400469 DOI: 10.1038/s41467-023-39531-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 06/13/2023] [Indexed: 07/05/2023] Open
Abstract
High-throughput experimentation (HTE) is an increasingly important tool in reaction discovery. While the hardware for running HTE in the chemical laboratory has evolved significantly in recent years, there remains a need for software solutions to navigate data-rich experiments. Here we have developed phactor™, a software that facilitates the performance and analysis of HTE in a chemical laboratory. phactor™ allows experimentalists to rapidly design arrays of chemical reactions or direct-to-biology experiments in 24, 96, 384, or 1,536 wellplates. Users can access online reagent data, such as a chemical inventory, to virtually populate wells with experiments and produce instructions to perform the reaction array manually, or with the assistance of a liquid handling robot. After completion of the reaction array, analytical results can be uploaded for facile evaluation, and to guide the next series of experiments. All chemical data, metadata, and results are stored in machine-readable formats that are readily translatable to various software. We also demonstrate the use of phactor™ in the discovery of several chemistries, including the identification of a low micromolar inhibitor of the SARS-CoV-2 main protease. Furthermore, phactor™ has been made available for free academic use in 24- and 96-well formats via an online interface.
Collapse
Affiliation(s)
- Babak Mahjour
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Rui Zhang
- Department of Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Yuning Shen
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Andrew McGrath
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Ruheng Zhao
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Osama G Mohamed
- Natural Products Discovery Core, Life Sciences Institute, University of Michigan, Ann Arbor, MI, USA
| | - Yingfu Lin
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Zirong Zhang
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - James L Douthwaite
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Ashootosh Tripathi
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
- Natural Products Discovery Core, Life Sciences Institute, University of Michigan, Ann Arbor, MI, USA
| | - Tim Cernak
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA.
- Department of Chemistry, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
15
|
Statt MJ, Rohr BA, Guevarra D, Suram SK, Morrell TE, Gregoire JM. The Materials Provenance Store. Sci Data 2023; 10:184. [PMID: 37024515 PMCID: PMC10079965 DOI: 10.1038/s41597-023-02107-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Accepted: 03/27/2023] [Indexed: 04/08/2023] Open
Abstract
We present a database resulting from high throughput experimentation, primarily on metal oxide solid state materials. The central relational database, the Materials Provenance Store (MPS), manages the metadata and experimental provenance from acquisition of raw materials, through synthesis, to a broad range of materials characterization techniques. Given the primary research goal of materials discovery of solar fuels materials, many of the characterization experiments involve electrochemistry, along with optical, structural, and compositional characterizations. The MPS is populated with all information required for executing common data queries, which typically do not involve direct query of raw data. The result is a database file that can be distributed to users so that they can independently execute queries and subsequently download the data of interest. We propose this strategy as an approach to manage the highly heterogeneous and distributed data that arises from materials science experiments, as demonstrated by the management of over 30 million experiments run on over 12 million samples in the present MPS release.
Collapse
Affiliation(s)
| | | | - Dan Guevarra
- Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA, 91125, USA
- Liquid Sunlight Alliance, California Institute of Technology, Pasadena, CA, 91125, USA
| | | | - Thomas E Morrell
- Caltech Library, California Institute of Technology, Pasadena, CA, 91125, USA
| | - John M Gregoire
- Division of Engineering and Applied Science, California Institute of Technology, Pasadena, CA, 91125, USA.
- Liquid Sunlight Alliance, California Institute of Technology, Pasadena, CA, 91125, USA.
| |
Collapse
|
16
|
Boiko DA, Kashin AS, Sorokin VR, Agaev YV, Zaytsev RG, Ananikov VP. Analyzing ionic liquid systems using real-time electron microscopy and a computational framework combining deep learning and classic computer vision techniques. J Mol Liq 2023. [DOI: 10.1016/j.molliq.2023.121407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
|
17
|
Duke R, Bhat V, Risko C. Data storage architectures to accelerate chemical discovery: data accessibility for individual laboratories and the community. Chem Sci 2022; 13:13646-13656. [PMID: 36544717 PMCID: PMC9710231 DOI: 10.1039/d2sc05142g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/06/2022] [Indexed: 11/11/2022] Open
Abstract
As buzzwords like "big data," "machine learning," and "high-throughput" expand through chemistry, chemists need to consider more than ever their data storage, data management, and data accessibility, whether in their own laboratories or with the broader community. While it is commonplace for chemists to use spreadsheets for data storage and analysis, a move towards database architectures ensures that the data can be more readily findable, accessible, interoperable, and reusable (FAIR). However, making this move has several challenges for those with limited-to-no knowledge of computer programming and databases. This Perspective presents basics of data management using databases with a focus on chemical data. We overview database fundamentals by exploring benefits of database use, introducing terminology, and establishing database design principles. We then detail the extract, transform, and load process for database construction, which includes an overview of data parsing and database architectures, spanning Standard Query Language (SQL) and No-SQL structures. We close by cataloging overarching challenges in database design. This Perspective is accompanied by an interactive demonstration available at https://github.com/D3TaLES/databases_demo. We do all of this within the context of chemical data with the aim of equipping chemists with the knowledge and skills to store, manage, and share their data while abiding by FAIR principles.
Collapse
Affiliation(s)
- Rebekah Duke
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky Lexington 40506 Kentucky USA
| | - Vinayak Bhat
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky Lexington 40506 Kentucky USA
| | - Chad Risko
- Department of Chemistry & Center for Applied Energy Research, University of Kentucky Lexington 40506 Kentucky USA
| |
Collapse
|
18
|
Murrieta-Dueñas R, Serrano-Rubio J, López-Ramírez V, Segovia-Dominguez I, Cortez-González J. Prediction of microbial growth via the hyperconic neural network approach. Chem Eng Res Des 2022. [DOI: 10.1016/j.cherd.2022.08.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|