1
|
Maldonado-Estudillo J, Navarro Crespo R, Marcos-Fernández Á, Caputto MDDD, Cruz-Jiménez G, Báez JE. Experimental Design (2 4) to Improve the Reaction Conditions of Non-Segmented Poly(ester-urethanes) (PEUs) Derived from α,ω-Hydroxy Telechelic Poly(ε-caprolactone) (HOPCLOH). Polymers (Basel) 2025; 17:668. [PMID: 40076160 PMCID: PMC11902744 DOI: 10.3390/polym17050668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2025] [Revised: 02/17/2025] [Accepted: 02/26/2025] [Indexed: 03/14/2025] Open
Abstract
Aliphatic unsegmented polyurethanes (PUs) have garnered relatively limited attention in the literature, despite their valuable properties such as UV resistance and biocompatibility, making them suitable for biomedical applications. This study focuses on synthesizing poly(ester-urethanes) (PEUs) using 1,6-hexamethylene diisocyanate and the macrodiol α,ω-hydroxy telechelic poly(ε-caprolactone) (HOPCLOH). To optimize the synthesis, a statistical experimental design approach was employed, a methodology not commonly utilized in polymer science. The influence of reaction temperature, time, reagent concentrations, and solvent type on the resulting PEUs was investigated. Characterization techniques included FT-IR, 1H NMR, differential scanning calorimetry (DSC), gel permeation chromatography (GPC), optical microscopy, and mechanical testing. The results demonstrated that all factors significantly impacted the number-average molecular weight (Mn) as determined by GPC. Furthermore, the statistical design revealed crucial interaction effects between factors, such as a dependence between reaction time and temperature. For example, a fixed reaction time of 1 h, with the temperature varying from 50 °C to 61 °C, did not significantly alter Mn. Better reaction conditions yielded high Mn (average: 162,000 g/mol), desirable mechanical properties (elongation at break > 1000%), low levels of unreacted HOPCLOH in the PEU films (OH/ESTER response = 0.0008), and reduced crystallinity (ΔHm = 11 J/g) in the soft segment, as observed by DSC and optical microscopy. In contrast, suboptimal conditions resulted in low Mn, brittle materials with unmeasurable mechanical properties, high crystallinity, and significant amounts of residual HOPCLOH. The best experimental conditions were 61 °C, 0.176 molal, 8 h, and chloroform as the solvent (ε = 4.8).
Collapse
Affiliation(s)
| | - Rodrigo Navarro Crespo
- Instituto de Ciencia y Tecnología de Polímeros (ICTP), Consejo Superior de Investigaciones Científicas (CSIC), Juan de la Cierva 3, 28006 Madrid, Spain (Á.M.-F.); (M.D.d.D.C.)
| | - Ángel Marcos-Fernández
- Instituto de Ciencia y Tecnología de Polímeros (ICTP), Consejo Superior de Investigaciones Científicas (CSIC), Juan de la Cierva 3, 28006 Madrid, Spain (Á.M.-F.); (M.D.d.D.C.)
| | - María Dolores de Dios Caputto
- Instituto de Ciencia y Tecnología de Polímeros (ICTP), Consejo Superior de Investigaciones Científicas (CSIC), Juan de la Cierva 3, 28006 Madrid, Spain (Á.M.-F.); (M.D.d.D.C.)
| | - Gustavo Cruz-Jiménez
- Departament of Pharmacy, University of Guanajuato (UG), Noria Alta S/N, Guanajuato 36050, Mexico
| | - José E. Báez
- Departament of Chemistry, University of Guanajuato (UG), Noria Alta S/N, Guanajuato 36050, Mexico
| |
Collapse
|
2
|
Ramos MC, Collison CJ, White AD. A review of large language models and autonomous agents in chemistry. Chem Sci 2025; 16:2514-2572. [PMID: 39829984 PMCID: PMC11739813 DOI: 10.1039/d4sc03921a] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2024] [Accepted: 12/03/2024] [Indexed: 01/22/2025] Open
Abstract
Large language models (LLMs) have emerged as powerful tools in chemistry, significantly impacting molecule design, property prediction, and synthesis optimization. This review highlights LLM capabilities in these domains and their potential to accelerate scientific discovery through automation. We also review LLM-based autonomous agents: LLMs with a broader set of tools to interact with their surrounding environment. These agents perform diverse tasks such as paper scraping, interfacing with automated laboratories, and synthesis planning. As agents are an emerging topic, we extend the scope of our review of agents beyond chemistry and discuss across any scientific domains. This review covers the recent history, current capabilities, and design of LLMs and autonomous agents, addressing specific challenges, opportunities, and future directions in chemistry. Key challenges include data quality and integration, model interpretability, and the need for standard benchmarks, while future directions point towards more sophisticated multi-modal agents and enhanced collaboration between agents and experimental methods. Due to the quick pace of this field, a repository has been built to keep track of the latest studies: https://github.com/ur-whitelab/LLMs-in-science.
Collapse
Affiliation(s)
- Mayk Caldas Ramos
- FutureHouse Inc. San Francisco CA USA
- Department of Chemical Engineering, University of Rochester Rochester NY USA
| | - Christopher J Collison
- School of Chemistry and Materials Science, Rochester Institute of Technology Rochester NY USA
| | - Andrew D White
- FutureHouse Inc. San Francisco CA USA
- Department of Chemical Engineering, University of Rochester Rochester NY USA
| |
Collapse
|
3
|
Mazumdar H, Khondakar KR, Das S, Halder A, Kaushik A. Artificial intelligence for personalized nanomedicine; from material selection to patient outcomes. Expert Opin Drug Deliv 2025; 22:85-108. [PMID: 39645588 DOI: 10.1080/17425247.2024.2440618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 11/15/2024] [Accepted: 12/06/2024] [Indexed: 12/09/2024]
Abstract
INTRODUCTION Artificial intelligence (AI) is changing the field of nanomedicine by exploring novel nanomaterials for developing therapies of high efficacy. AI works on larger datasets, finding sought-after nano-properties for different therapeutic aims and eventually enhancing nanomaterials' safety and effectiveness. AI leverages patient clinical and genetic data to predict outcomes, guide treatments, and optimize drug dosages and forms, enhancing benefits while minimizing side effects. AI-supported nanomedicine faces challenges like data fusion, ethics, and regulation, requiring better tools and interdisciplinary collaboration. This review highlights the importance of AI regarding patient care and urges scientists, medical professionals, and regulators to adopt AI for better outcomes. AREAS COVERED Personalized Nanomedicine, Material Discovery, AI-Driven Therapeutics, Data Integration, Drug Delivery, Patient Centric Care. EXPERT OPINION Today, AI can improve personalized health wellness through the discovery of new types of drug nanocarriers, nanomedicine of specific properties to tackle targeted medical needs, and an increment in efficacy along with safety. Nevertheless, problems such as ethical issues, data security, or unbalanced data sets need to be addressed. Potential future developments involve using AI and quantum computing together and exploring telemedicine i.e. the Internet-of-Medical-Things (IoMT) approach can enhance the quality of patient care in a personalized manner by timely decision-making.
Collapse
Affiliation(s)
- Hirak Mazumdar
- Department of Computer Science and Engineering, Adamas University, Kolkata, India
| | | | - Suparna Das
- Department of Computer Science and Engineering, BVRIT HYDERABAD College of Engineering for Women, Hyderabad, India
| | - Animesh Halder
- Department of Electrical and Electronics Engineering, Adamas University, Kolkata, India
| | - Ajeet Kaushik
- Nano Biotech Laboratory, Department of Environmental Engineering, Florida Polytechnic University, Lakeland, FL, USA
| |
Collapse
|
4
|
Rani N, Kumar R, Mazumder S. AI-Driven Discovery of Asymmetric Pauson-Khand Reactions: A New Toolbox in a Synthetic Chemist's Treasure. J Phys Chem A 2024; 128:10452-10463. [PMID: 39570149 DOI: 10.1021/acs.jpca.4c06701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2024]
Abstract
Enantioselective catalytic reactions have a significant impact on chemical synthesis, and they are important components in an experimental chemist's toolbox. However, development of asymmetric catalysts often relies on the chemical intuition and experience of a synthetic chemist, making the process both time-consuming and resource-intensive. The machine-learning-assisted reaction discovery can serve as a very efficient platform for obtaining high-performing catalysts in a time-economical manner without extensive experimentation. Herein, we report a data-driven and machine learning method for reliably predicting enantiomeric excess (%ee) of 211 asymmetric Pauson-Khand reactions (PKR 1-PKR 211) between a variety of 45 unique 1,6-enyne substrates and 12 unique axially chiral biaryl ligands in the presence of different reaction conditions like varying CO gas pressure, temperature, and solvent polarity. Four different machine learning algorithms have been studied: extreme gradient boosting (XGBoost), random forest (RF), light gradient boosting machine (LGBM), and neural network (NN). A fivefold cross validation method was applied to our k-means SMOTE-augmented data set to obtain the optimized hyperparameters for the training set, and subsequently, these parameters were used in the test data set. In the case of the out-of-box set, the XGBoost method is found to be superior among all four machine learning methods investigated. Our out-of-box samples contain a total of 12 unique asymmetric Pauson-Khand reactions (PKR 212-PKR 223) arising from three new 1,3-benzodioxole-based SEGPHOS catalysts, which were never included in the training set. The XGBoost algorithm shows an impressive root mean square error (RMSE) of 7.06 (±1.11) in predicting %ee. The XGBoost-predicted %ee values match reasonably well with the experimental results. The absolute difference between the experimental and XGBoost-calculated %ee values ranges from 0.9 to 7.6 for the majority of the out-of-box Pauson-Khand reactions. The reactions with fluoro-substituted-SEGPHOS ligand L14 shows smaller deviations from the experimental %ee values compared to the reactions with L13 and L15 catalysts where the benzodioxole units do not have fluorine atoms. Finally, we have discovered a library of 3357 lead reactions with excellent %ee (≥99) by engaging the experimentally unknown combinations of the catalysts, substrates, and reaction conditions. The axially chiral biaryl catalysts and enyne substrates present in the library are synthetically accessible. The ligand space in the library is dominated by the presence of tol-BINAP and the DTBM-OMe-BIPHEP ligands. The substrate space is predominantly occupied by NTs-tethered, O-tethered, NBn-tethered, and C(CO2Me)2-tethered 1,6-enynes that have an H or methyl functional group present in the alkyne unit. Our newly discovered library assists a synthetic chemist to develop a highly enantioselective PKR by starting with a priori knowledge without extensive trial-and-error experimentation.
Collapse
Affiliation(s)
- Neha Rani
- Department of Chemistry, Indian Institute of Technology Jammu, Jammu 181221, India
| | - Rohit Kumar
- Novartis, HITEC City, Hyderabad, Telangana 500081, India
| | - Shivnath Mazumder
- Department of Chemistry, Indian Institute of Technology Jammu, Jammu 181221, India
| |
Collapse
|
5
|
Uceda RG, Gijón A, Míguez‐Lago S, Cruz CM, Blanco V, Fernández‐Álvarez F, Álvarez de Cienfuegos L, Molina‐Solana M, Gómez‐Romero J, Miguel D, Mota AJ, Cuerva JM. Can Deep Learning Search for Exceptional Chiroptical Properties? The Halogenated [6]Helicene Case. Angew Chem Int Ed Engl 2024; 63:e202409998. [PMID: 39329214 PMCID: PMC11586703 DOI: 10.1002/anie.202409998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2024] [Revised: 09/11/2024] [Accepted: 09/24/2024] [Indexed: 09/28/2024]
Abstract
The relationship between chemical structure and chiroptical properties is not always clearly understood. Nowadays, efforts to develop new systems with enhanced optical properties follow the trial-error method. A large number of data would allow us to obtain more robust conclusions and guide research toward molecules with practical applications. In this sense, in this work we predict the chiroptical properties of millions of halogenated [6]helicenes in terms of the rotatory strength (R). We have used DFT calculations to randomly create derivatives including from 1 to 16 halogen atoms, that were then used as a data set to train different deep neural network models. These models allow us to i) predict the Rmax for any halogenated [6]helicene with a very low computational cost, and ii) to understand the physical reasons that favour some substitutions over others. Finally, we synthesized derivatives with higher predicted Rmax obtaining excellent correlation among the values obtained experimentally and the predicted ones.
Collapse
Affiliation(s)
- Rafael G. Uceda
- Departamento de Química Orgánica, Unidad de Excelencia de Química Aplicada a la Biomedicina y Medioambiente (UEQ)Universidad de Granada (UGR), Facultad de CienciasC. U. Fuentenueva18071GranadaSpain
| | - Alfonso Gijón
- Departamento de Ciencias de la Computación e Inteligencia Artificial, UGRE.T.S. de Ingenierías Informática y de TelecomunicaciónC/ Periodista Daniel Saucedo Aranda S/N18071GranadaSpain
| | - Sandra Míguez‐Lago
- Departamento de Química Orgánica, Unidad de Excelencia de Química Aplicada a la Biomedicina y Medioambiente (UEQ)Universidad de Granada (UGR), Facultad de CienciasC. U. Fuentenueva18071GranadaSpain
| | - Carlos M. Cruz
- Departamento de Química Orgánica, Unidad de Excelencia de Química Aplicada a la Biomedicina y Medioambiente (UEQ)Universidad de Granada (UGR), Facultad de CienciasC. U. Fuentenueva18071GranadaSpain
| | - Víctor Blanco
- Departamento de Química Orgánica, Unidad de Excelencia de Química Aplicada a la Biomedicina y Medioambiente (UEQ)Universidad de Granada (UGR), Facultad de CienciasC. U. Fuentenueva18071GranadaSpain
| | - Fátima Fernández‐Álvarez
- Departamento de Química Orgánica, Unidad de Excelencia de Química Aplicada a la Biomedicina y Medioambiente (UEQ)Universidad de Granada (UGR), Facultad de CienciasC. U. Fuentenueva18071GranadaSpain
| | - Luis Álvarez de Cienfuegos
- Departamento de Química Orgánica, Unidad de Excelencia de Química Aplicada a la Biomedicina y Medioambiente (UEQ)Universidad de Granada (UGR), Facultad de CienciasC. U. Fuentenueva18071GranadaSpain
- Instituto de Investigación BiosanitariaAvda. Madrid, 1518016GranadaSpain
| | - Miguel Molina‐Solana
- Departamento de Ciencias de la Computación e Inteligencia Artificial, UGRE.T.S. de Ingenierías Informática y de TelecomunicaciónC/ Periodista Daniel Saucedo Aranda S/N18071GranadaSpain
| | - Juan Gómez‐Romero
- Departamento de Ciencias de la Computación e Inteligencia Artificial, UGRE.T.S. de Ingenierías Informática y de TelecomunicaciónC/ Periodista Daniel Saucedo Aranda S/N18071GranadaSpain
| | - Delia Miguel
- Departamento de Fisicoquímica, UEQ, UGRFacultad de FarmaciaAvda. Profesor Clavera s/nC. U. Cartuja18071GranadaSpain
| | - Antonio J. Mota
- Departamento de Química Inorgánica, UEQ, UGRFacultad de CienciasC. U. Fuentenueva18071GranadaSpain
| | - Juan M. Cuerva
- Departamento de Química Orgánica, Unidad de Excelencia de Química Aplicada a la Biomedicina y Medioambiente (UEQ)Universidad de Granada (UGR), Facultad de CienciasC. U. Fuentenueva18071GranadaSpain
| |
Collapse
|
6
|
Iyamuremye A, Twagilimana I, Niyonzima FN. Examining the utilization of web-based discussion tools in teaching and learning organic chemistry in selected Rwandan secondary schools. Heliyon 2024; 10:e39356. [PMID: 39498082 PMCID: PMC11532257 DOI: 10.1016/j.heliyon.2024.e39356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 09/30/2024] [Accepted: 10/12/2024] [Indexed: 11/07/2024] Open
Abstract
In recent years, the teaching and learning of organic chemistry have frequently faced challenges due to limited student engagement and participation. Consequently, there is a growing demand for innovative teaching methods to tackle these issues. In this context, web-based discussions have emerged as a hopeful approach to enhance students' engagement and foster critical thinking skills. Therefore, the present study investigated the level of adoption of web-based discussion tools in teaching organic chemistry in Rwandan secondary schools for addressing the challenge of limited student engagement and participation. A quantitative research approach relying on a survey questionnaire was used to collect data from 133 secondary school chemistry teachers in Kamonyi and Gasabo districts. The findings indicate that 78 % of teachers do not use web-based discussion tools, while 22 % have integrated these tools into their teaching. The preferred platforms among users include WhatsApp groups, Google Docs, and Google Classroom. Additionally, the study highlights key organic chemistry topics such as alkanes, polymers, polymerization, and alcohol that can be effectively taught through these tools. Statistical analysis using ANCOVA did not show significant differences in the use of web-based discussion tools based on factors like school location, teachers' age, school ownership, and teaching experience, with p-values of 0.817, 0.234, 0.380, and 0.051, respectively. However, the borderline significance related to teaching experience (p = 0.051) suggests a potential trend. A significant difference was observed in terms of gender, with male teachers more likely to use these tools (p = 0.015). The study offers valuable insights into the factors influencing the adoption of web-based discussion tools in Rwanda, offering useful guidance for educators and curriculum developers to create more engaging and inclusive chemistry lessons.
Collapse
Affiliation(s)
- Aloys Iyamuremye
- University of Rwanda-College of Education, Kayonza, Rwanda
- African Center for Excellence for Innovative in Teaching and Learning Mathematics and Science (ACEITLMS), Kayonza, Rwanda
| | - Innocent Twagilimana
- University of Rwanda-College of Education, Kayonza, Rwanda
- African Center for Excellence for Innovative in Teaching and Learning Mathematics and Science (ACEITLMS), Kayonza, Rwanda
| | - Francois Niyongabo Niyonzima
- University of Rwanda-College of Education, Kayonza, Rwanda
- African Center for Excellence for Innovative in Teaching and Learning Mathematics and Science (ACEITLMS), Kayonza, Rwanda
| |
Collapse
|
7
|
Sioris P, Mäkelä M, Kontunen A, Karjalainen M, Vehkaoja A, Oksala N, Roine A. Identification of Phospholipids Relevant to Cancer Tissue Using Differential Ion Mobility Spectrometry. Int J Mol Sci 2024; 25:11002. [PMID: 39456784 PMCID: PMC11508011 DOI: 10.3390/ijms252011002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2024] [Revised: 10/04/2024] [Accepted: 10/07/2024] [Indexed: 10/28/2024] Open
Abstract
Phospholipids are the main building components of cell membranes and are also used for cell signaling and as energy storages. Cancer cells alter their lipid metabolism, which ultimately leads to an increase in phospholipids in cancer tissue. Surgical energy instruments use electrical or vibrational energy to heat tissues, which causes intra- and extracellular water to expand rapidly and degrade cell structures, bursting the cells, which causes the formation of a tissue aerosol or smoke depending on the amount of energy used. This gas phase analyte can then be analyzed via gas analysis methods. Differential mobility spectrometry (DMS) is a method that can be used to differentiate malignant tissue from benign tissues in real time via the analysis of surgical smoke produced by energy instruments. Previously, the DMS identification of cancer tissue was based on a 'black box method' by differentiating the 2D dispersion plots of samples. This study sets out to find datapoints from the DMS dispersion plots that represent relevant target molecules. We studied the ability of DMS to differentiate three subclasses of phospholipids (phosphatidylcholine, phosphatidylinositol, and phosphatidylethanolamine) from a control sample using a bovine skeletal muscle matrix with a 5 mg addition of each phospholipid subclass to the sample matrix. We trained binary classifiers using linear discriminant analysis (LDA) and support vector machines (SVM) for sample classification. We were able to identify phosphatidylcholine, -inositol, and -ethanolamine with SVM binary classification accuracies of 91%, 73%, and 66% and with LDA binary classification accuracies of 82%, 74%, and 72%, respectively. Phosphatidylcholine was detected with a reliable classification accuracy, but ion separation setups should be adjusted in future studies to reliably detect other relevant phospholipids such as phosphatidylinositol and phosphatidylethanolamine and improve DMS as a microanalysis method and identify other phospholipids relevant to cancer tissue.
Collapse
Affiliation(s)
- Patrik Sioris
- Faculty of Medicine and Health Technology, Tampere University, 33520 Tampere, Finland; (A.V.)
- TAYS Cancer Centre, Tampere University Hospital, Wellbeing Services County of Pirkanmaa, 33521 Tampere, Finland
| | - Meri Mäkelä
- TAYS Cancer Centre, Tampere University Hospital, Wellbeing Services County of Pirkanmaa, 33521 Tampere, Finland
- Olfactomics Ltd., 33720 Tampere, Finland
| | - Anton Kontunen
- TAYS Cancer Centre, Tampere University Hospital, Wellbeing Services County of Pirkanmaa, 33521 Tampere, Finland
- Olfactomics Ltd., 33720 Tampere, Finland
| | - Markus Karjalainen
- TAYS Cancer Centre, Tampere University Hospital, Wellbeing Services County of Pirkanmaa, 33521 Tampere, Finland
- Olfactomics Ltd., 33720 Tampere, Finland
| | - Antti Vehkaoja
- Faculty of Medicine and Health Technology, Tampere University, 33520 Tampere, Finland; (A.V.)
- TAYS Cancer Centre, Tampere University Hospital, Wellbeing Services County of Pirkanmaa, 33521 Tampere, Finland
| | - Niku Oksala
- Faculty of Medicine and Health Technology, Tampere University, 33520 Tampere, Finland; (A.V.)
- TAYS Cancer Centre, Tampere University Hospital, Wellbeing Services County of Pirkanmaa, 33521 Tampere, Finland
- Olfactomics Ltd., 33720 Tampere, Finland
- Centre for Vascular Surgery and Interventional Radiology, Tampere University Hospital, 33520 Tampere, Finland
| | - Antti Roine
- Faculty of Medicine and Health Technology, Tampere University, 33520 Tampere, Finland; (A.V.)
- TAYS Cancer Centre, Tampere University Hospital, Wellbeing Services County of Pirkanmaa, 33521 Tampere, Finland
- Olfactomics Ltd., 33720 Tampere, Finland
| |
Collapse
|
8
|
John L, Nagamani S, Mahanta HJ, Vaikundamani S, Kumar N, Kumar A, Jamir E, Priyadarsinee L, Sastry GN. Molecular Property Diagnostic Suite Compound Library (MPDS-CL): a structure-based classification of the chemical space. Mol Divers 2024; 28:3243-3259. [PMID: 37902900 DOI: 10.1007/s11030-023-10752-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2023] [Accepted: 10/17/2023] [Indexed: 11/01/2023]
Abstract
Molecular Property Diagnostic Suite Compound Library (MPDS-CL) is an open-source Galaxy-based cheminformatics web portal which presents a structure-based classification of the molecules. A structure-based classification of nearly 150 million unique compounds, obtained from 42 publicly available databases and curated for redundancy removal through 97 hierarchically well-defined atom composition-based portions, has been done. These are further subjected to 56-bit fingerprint-based classification algorithm which led to the formation of 56 structurally well-defined classes. The classes thus obtained were further divided into clusters based on their molecular weight. Thus, the entire set of molecules was put into 56 different classes and 625 clusters. This led to the assignment of a unique ID, named as MPDS-AadharID, for each of these 149,169,443 molecules. MPDS-AadharID is akin to the unique number given to citizens in India (similar to SSN in the US and NINO in the UK). The unique features of MPDS-CL are (a) several search options, such as exact structure search, substructure search, property-based search, fingerprint-based search, using SMILES, InChIKey and key-in; (b) automatic generation of information for the processing for MPDS and other galaxy tools; (c) providing the class and cluster of a molecule which makes it easier and fast to search for similar molecules and (d) information related to the presence of the molecules in multiple databases. The MPDS-CL can be accessed at https://mpds.neist.res.in:8086/ .
Collapse
Affiliation(s)
- Lijo John
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Selvaraman Nagamani
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Hridoy Jyoti Mahanta
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - S Vaikundamani
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
| | - Nandan Kumar
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Asheesh Kumar
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
| | - Esther Jamir
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - Lipsa Priyadarsinee
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India
| | - G Narahari Sastry
- Advanced Computation and Data Sciences Division, CSIR - North East Institute of Science and Technology, Jorhat, 785006, India.
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, 201002, India.
| |
Collapse
|
9
|
Wahab MF, Handlovic TT, Roy S, Burk RJ, Armstrong DW. Solving Advanced Task-Specific Problems in Measurement Sciences with Generative AI. Anal Chem 2024. [PMID: 39017630 DOI: 10.1021/acs.analchem.4c01734] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/18/2024]
Abstract
The Generative Pre-Trained Transformer known as ChatGPT-4 has undergone extensive pretraining on a diverse data set, enabling it to generate coherent and contextually relevant text based on the input it receives. This capability allows it to perform tasks from answering questions and has attracted significant interest in material sciences, synthetic chemistry, and drug discovery. In this work, we posed four advanced task-specific problems to ChatGPT, which were recently published in leading journals for topics in analytical chemistry, spectroscopy, bioimage super-resolution, and electrochemistry. ChatGPT-4 successfully implemented the four ideas after assigning the "persona" to the AI and posing targeted questions. We show two cases where "unguided" ChatGPT could complete the assignments with minimal human direction. The construction of a microwave spectrum from a free induction curve and super-resolution of bioimages was accomplished using this approach. Two other specific tasks, correcting a complex baseline with morphological operations of set theory and estimating the diffusion current, required additional input, e.g., equations and specific directions from the user. In each case, the MATLAB code was eventually generated by ChatGPT-4 even when the original authors did not provide any code themselves. We show that a validation test must be implemented to detect and correct mistakes made by ChatGPT-4, followed by feedback correction. These approaches will pave the way for open and transparent science and eliminate the black boxes in measurement science when it comes to advanced data processing.
Collapse
Affiliation(s)
- M Farooq Wahab
- Department of Chemistry & Biochemistry, University of Texas at Arlington, Arlington, Texas 76019, United States
| | - Troy T Handlovic
- Department of Chemistry & Biochemistry, University of Texas at Arlington, Arlington, Texas 76019, United States
| | - Souvik Roy
- Department of Mathematics, University of Texas at Arlington, Arlington, Texas 76019, United States
| | - Ryan Jacob Burk
- Department of Chemistry & Biochemistry, University of Texas at Arlington, Arlington, Texas 76019, United States
| | - Daniel W Armstrong
- Department of Chemistry & Biochemistry, University of Texas at Arlington, Arlington, Texas 76019, United States
| |
Collapse
|
10
|
Cardoso Rial R. AI in analytical chemistry: Advancements, challenges, and future directions. Talanta 2024; 274:125949. [PMID: 38569367 DOI: 10.1016/j.talanta.2024.125949] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 03/09/2024] [Accepted: 03/17/2024] [Indexed: 04/05/2024]
Abstract
This article explores the influence and applications of Artificial Intelligence (AI) in analytical chemistry, highlighting its potential to revolutionize the analysis of complex data sets and the development of innovative analytical methods. Additionally, it discusses the role of AI in interpreting large-scale data and optimizing experimental processes. AI has been fundamental in managing heterogeneous data and in advanced analysis of complex spectra in areas such as spectroscopy and chromatography. The article also examines the historical development of AI in chemistry, its current challenges, including the interpretation of AI models and the integration of large volumes of data. Finally, it forecasts future trends and the potential impact of AI on analytical chemistry, emphasizing the need for ethical and secure approaches in the use of AI.
Collapse
Affiliation(s)
- Rafael Cardoso Rial
- Federal Institute of Mato Grosso do Sul, 79750-000, Nova Andradina, MS, Brazil.
| |
Collapse
|
11
|
Sigmund LM, S SS, Albers A, Erdmann P, Paton RS, Greb L. Predicting Lewis Acidity: Machine Learning the Fluoride Ion Affinity of p-Block-Atom-Based Molecules. Angew Chem Int Ed Engl 2024; 63:e202401084. [PMID: 38452299 DOI: 10.1002/anie.202401084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2024] [Revised: 03/01/2024] [Accepted: 03/04/2024] [Indexed: 03/09/2024]
Abstract
"How strong is this Lewis acid?" is a question researchers often approach by calculating its fluoride ion affinity (FIA) with quantum chemistry. Here, we present FIA49k, an extensive FIA dataset with 48,986 data points calculated at the RI-DSD-BLYP-D3(BJ)/def2-QZVPP//PBEh-3c level of theory, including 13 different p-block atoms as the fluoride accepting site. The FIA49k dataset was used to train FIA-GNN, two message-passing graph neural networks, which predict gas and solution phase FIA values of molecules excluded from training with a mean absolute error of 14 kJ mol-1 (r2=0.93) from the SMILES string of the Lewis acid as the only input. The level of accuracy is notable, given the wide energetic range of 750 kJ mol-1 spanned by FIA49k. The model's value was demonstrated with four case studies, including predictions for molecules extracted from the Cambridge Structural Database and by reproducing results from catalysis research available in the literature. Weaknesses of the model are evaluated and interpreted chemically. FIA-GNN and the FIA49k dataset can be reached via a free web app (www.grebgroup.de/fia-gnn).
Collapse
Affiliation(s)
- Lukas M Sigmund
- Anorganisch-Chemisches Institut, Ruprecht-Karls-Universität Heidelberg, Im Neuenheimer Feld 270, 69120, Heidelberg, Germany
- Department of Chemistry, Colorado State University, 1301 Center Avenue, Fort Collins, CO, 80523, USA
| | - Shree Sowndarya S
- Department of Chemistry, Colorado State University, 1301 Center Avenue, Fort Collins, CO, 80523, USA
| | - Andreas Albers
- Anorganisch-Chemisches Institut, Ruprecht-Karls-Universität Heidelberg, Im Neuenheimer Feld 270, 69120, Heidelberg, Germany
| | - Philipp Erdmann
- Anorganisch-Chemisches Institut, Ruprecht-Karls-Universität Heidelberg, Im Neuenheimer Feld 270, 69120, Heidelberg, Germany
| | - Robert S Paton
- Department of Chemistry, Colorado State University, 1301 Center Avenue, Fort Collins, CO, 80523, USA
| | - Lutz Greb
- Anorganisch-Chemisches Institut, Ruprecht-Karls-Universität Heidelberg, Im Neuenheimer Feld 270, 69120, Heidelberg, Germany
| |
Collapse
|
12
|
Korlepara DB, C S V, Srivastava R, Pal PK, Raza SH, Kumar V, Pandit S, Nair AG, Pandey S, Sharma S, Jeurkar S, Thakran K, Jaglan R, Verma S, Ramachandran I, Chatterjee P, Nayar D, Priyakumar UD. PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications. Sci Data 2024; 11:180. [PMID: 38336857 PMCID: PMC10858175 DOI: 10.1038/s41597-023-02872-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 12/21/2023] [Indexed: 02/12/2024] Open
Abstract
Computing binding affinities is of great importance in drug discovery pipeline and its prediction using advanced machine learning methods still remains a major challenge as the existing datasets and models do not consider the dynamic features of protein-ligand interactions. To this end, we have developed PLAS-20k dataset, an extension of previously developed PLAS-5k, with 97,500 independent simulations on a total of 19,500 different protein-ligand complexes. Our results show good correlation with the available experimental values, performing better than docking scores. This holds true even for a subset of ligands that follows Lipinski's rule, and for diverse clusters of complex structures, thereby highlighting the importance of PLAS-20k dataset in developing new ML models. Along with this, our dataset is also beneficial in classifying strong and weak binders compared to docking. Further, OnionNet model has been retrained on PLAS-20k dataset and is provided as a baseline for the prediction of binding affinities. We believe that large-scale MD-based datasets along with trajectories will form new synergy, paving the way for accelerating drug discovery.
Collapse
Affiliation(s)
- Divya B Korlepara
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
- Divison of Physics, School of Advanced Sciences, Vellore Institute of Technology, Chennai, 600127, India
| | - Vasavi C S
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
- Department of Artificial Intelligence, School of Artificial Intelligence, Amrita Vishwa Vidyapeetham, Bengaluru, 560035, India
| | - Rakesh Srivastava
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Pradeep Kumar Pal
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Saalim H Raza
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Vishal Kumar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shivam Pandit
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Aathira G Nair
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Sanjana Pandey
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shubham Sharma
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shruti Jeurkar
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India
| | - Kavita Thakran
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Reena Jaglan
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Shivangi Verma
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Indhu Ramachandran
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Prathit Chatterjee
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India
| | - Divya Nayar
- Department of Materials Science and Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, 110016, India.
| | - U Deva Priyakumar
- IHub-Data, International Institute of Information Technology, Hyderabad, 500032, India.
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, 500032, India.
| |
Collapse
|
13
|
Tokita AM, Behler J. How to train a neural network potential. J Chem Phys 2023; 159:121501. [PMID: 38127396 DOI: 10.1063/5.0160326] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 07/24/2023] [Indexed: 12/23/2023] Open
Abstract
The introduction of modern Machine Learning Potentials (MLPs) has led to a paradigm change in the development of potential energy surfaces for atomistic simulations. By providing efficient access to energies and forces, they allow us to perform large-scale simulations of extended systems, which are not directly accessible by demanding first-principles methods. In these simulations, MLPs can reach the accuracy of electronic structure calculations, provided that they have been properly trained and validated using a suitable set of reference data. Due to their highly flexible functional form, the construction of MLPs has to be done with great care. In this Tutorial, we describe the necessary key steps for training reliable MLPs, from data generation via training to final validation. The procedure, which is illustrated for the example of a high-dimensional neural network potential, is general and applicable to many types of MLPs.
Collapse
Affiliation(s)
- Alea Miako Tokita
- Lehrstuhl für Theoretische Chemie II, Ruhr-Universität Bochum, 44780 Bochum, Germany and Research Center Chemical Sciences and Sustainability, Research Alliance Ruhr, 44780 Bochum, Germany
| | - Jörg Behler
- Lehrstuhl für Theoretische Chemie II, Ruhr-Universität Bochum, 44780 Bochum, Germany and Research Center Chemical Sciences and Sustainability, Research Alliance Ruhr, 44780 Bochum, Germany
| |
Collapse
|
14
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 73] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
15
|
Kee CW. Molecular Understanding and Practical In Silico Catalyst Design in Computational Organocatalysis and Phase Transfer Catalysis-Challenges and Opportunities. Molecules 2023; 28:1715. [PMID: 36838703 PMCID: PMC9966076 DOI: 10.3390/molecules28041715] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 02/03/2023] [Accepted: 02/05/2023] [Indexed: 02/25/2023] Open
Abstract
Through the lens of organocatalysis and phase transfer catalysis, we will examine the key components to calculate or predict catalysis-performance metrics, such as turnover frequency and measurement of stereoselectivity, via computational chemistry. The state-of-the-art tools available to calculate potential energy and, consequently, free energy, together with their caveats, will be discussed via examples from the literature. Through various examples from organocatalysis and phase transfer catalysis, we will highlight the challenges related to the mechanism, transition state theory, and solvation involved in translating calculated barriers to the turnover frequency or a metric of stereoselectivity. Examples in the literature that validated their theoretical models will be showcased. Lastly, the relevance and opportunity afforded by machine learning will be discussed.
Collapse
Affiliation(s)
- Choon Wee Kee
- Institute of Sustainability for Chemicals, Energy and Environment (ISCE2), Agency for Science, Technology and Research (A*STAR), 1 Pesek Road, Jurong Island, Singapore 627833, Republic of Singapore
| |
Collapse
|
16
|
Kondratyev V, Dryzhakov M, Gimadiev T, Slutskiy D. Generative model based on junction tree variational autoencoder for HOMO value prediction and molecular optimization. J Cheminform 2023; 15:11. [PMID: 36732800 PMCID: PMC9893566 DOI: 10.1186/s13321-023-00681-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Accepted: 01/06/2023] [Indexed: 02/04/2023] Open
Abstract
In this work, we provide further development of the junction tree variational autoencoder (JT VAE) architecture in terms of implementation and application of the internal feature space of the model. Pretraining of JT VAE on a large dataset and further optimization with a regression model led to a latent space that can solve several tasks simultaneously: prediction, generation, and optimization. We use the ZINC database as a source of molecules for the JT VAE pretraining and the QM9 dataset with its HOMO values to show the application case. We evaluate our model on multiple tasks such as property (value) prediction, generation of new molecules with predefined properties, and structure modification toward the property. Across these tasks, our model shows improvements in generation and optimization tasks while preserving the precision of state-of-the-art models.
Collapse
Affiliation(s)
- Vladimir Kondratyev
- Computer Science and Artificial Intelligence Laboratory, ENGIE Lab CRIGEN, 4 rue Josephine Baker, 93240 Stains, France ,grid.89485.380000 0004 0600 5611Telecom Paris, 19 Place Marguerite Perey, CS 20031, 91123 Palaiseau, France
| | - Marian Dryzhakov
- Computer Science and Artificial Intelligence Laboratory, ENGIE Lab CRIGEN, 4 rue Josephine Baker, 93240 Stains, France
| | - Timur Gimadiev
- grid.77268.3c0000 0004 0543 9688Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, 18 Kremlyovskaya str., 420008 Kazan, Russia ,grid.465285.80000 0004 0637 9007Federal Research Center “Kazan Scientific Center of Russian Academy of Sciences”, 420008 Kazan, Russia ,JSC “BIOCAD”, Petrodvortsoviy District, Strelna, Svyazi St., Bld. 34, Liter A., 198515 St. Petersburg, Russia
| | - Dmitriy Slutskiy
- Computer Science and Artificial Intelligence Laboratory, ENGIE Lab CRIGEN, 4 rue Josephine Baker, 93240 Stains, France
| |
Collapse
|
17
|
Mazur H, Erbrich L, Quodbach J. Investigations into the use of machine learning to predict drug dosage form design to obtain desired release profiles for 3D printed oral medicines. Pharm Dev Technol 2023; 28:219-231. [PMID: 36715438 DOI: 10.1080/10837450.2023.2173778] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Three-dimensional (3D) printing, digitalization, and artificial intelligence (AI) are gaining increasing interest in modern medicine. All three aspects are combined in personalized medicine where 3D-printed dosage forms are advantageous because of their variable geometry design. The geometry design can be used to determine the surface area to volume (SA/V) ratio, which affects drug release from the dosage forms. This study investigated artificial neural networks (ANN) to predict suitable geometries for the desired dose and release profile. Filaments with 5% API load and polyvinyl alcohol were 3D printed using Fused Deposition Modeling to provide a wide variety of geometries with different dosages and SA/V ratios. These were dissolved in vitro, and the API release profiles were described mathematically. Using these data, ANN architectures were designed with the goal of predicting a suitable dosage form geometry. Poor accuracies of 68.5% in the training and 44.4% in the test settings were achieved with a classification architecture. However, the SA/V ratio could be predicted accurately with a mean squared error loss of only 0.05. This study shows that the prediction of the SA/V ratio using AI works, but not of the exact geometry. For this purpose, a global database could be built with a range of geometries to simplify the prescription process.
Collapse
Affiliation(s)
- Hellen Mazur
- Institute of Pharmaceutics and Biopharmaceutics, Heinrich Heine University, Düsseldorf, Germany
| | - Leon Erbrich
- Institute of Pharmaceutics and Biopharmaceutics, Heinrich Heine University, Düsseldorf, Germany
| | - Julian Quodbach
- Institute of Pharmaceutics and Biopharmaceutics, Heinrich Heine University, Düsseldorf, Germany.,Department of Pharmaceutics, Utrecht Institute for Pharmaceutical Sciences, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
18
|
Davies JC, Pattison D, Hirst JD. Machine learning for yield prediction for chemical reactions using in situ sensors. J Mol Graph Model 2023; 118:108356. [PMID: 36272195 DOI: 10.1016/j.jmgm.2022.108356] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/30/2022] [Accepted: 09/30/2022] [Indexed: 11/28/2022]
Abstract
Machine learning models were developed to predict product formation from time-series reaction data for ten Buchwald-Hartwig coupling reactions. The data was provided by DeepMatter and was collected in their DigitalGlassware cloud platform. The reaction probe has 12 sensors to measure properties of interest, including temperature, pressure, and colour. Colour was a good predictor of product formation for this reaction and machine learning models were able to learn which of the properties were important. Predictions for the current product formation (in terms of % yield) had a mean absolute error of 1.2%. For predicting 30, 60 and 120 min ahead the error rose to 3.4, 4.1 and 4.6%, respectively. The work here presents an example into the insight that can be obtained from applying machine learning methods to sensor data in synthetic chemistry.
Collapse
Affiliation(s)
- Joseph C Davies
- School of Chemistry, University of Nottingham, University Park, Nottingham, NG7 2RD, UK
| | | | - Jonathan D Hirst
- School of Chemistry, University of Nottingham, University Park, Nottingham, NG7 2RD, UK.
| |
Collapse
|
19
|
Xie Y, Zhang Y, Wong KC, Shi M, Peng C. Improving Chemical Reaction Prediction with Unlabeled Data. Molecules 2022; 27:molecules27185967. [PMID: 36144703 PMCID: PMC9506495 DOI: 10.3390/molecules27185967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Revised: 09/04/2022] [Accepted: 09/08/2022] [Indexed: 11/18/2022] Open
Abstract
Predicting products of organic chemical reactions is useful in chemical sciences, especially when one or more reactants are new organics. However, the performance of traditional learning models heavily relies on high-quality labeled data. In this work, to utilize unlabeled data for better prediction performance, we propose a method that combines semi-supervised learning with graph convolutional neural networks for chemical reaction prediction. First, we propose a Mean Teacher Weisfeiler–Lehman Network to find the reaction centers. Then, we construct the candidate product set. Finally, we use an Improved Weisfeiler–Lehman Difference Network to rank candidate products. Experimental results demonstrate that, with 400k labeled data, our framework can improve the top-5 accuracy by 0.7% using 35k unlabeled data. When the proportion of unlabeled data increases, the performance gain can be larger. For example, with 80k labeled data and 35k unlabeled data, the performance gain with our framework can be 1.8%.
Collapse
Affiliation(s)
- Yu Xie
- College of Information Science and Engineering, Ningbo University, Ningbo 315211, China
| | - Yuyang Zhang
- College of Information Science and Engineering, Ningbo University, Ningbo 315211, China
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hongkong 999077, China
| | - Meixia Shi
- College of Chemical Engineering, Ningbo Polytechnic, Ningbo 315000, China
| | - Chengbin Peng
- College of Information Science and Engineering, Ningbo University, Ningbo 315211, China
- Correspondence:
| |
Collapse
|
20
|
Oliveira ON, Oliveira MCF. Materials Discovery With Machine Learning and Knowledge Discovery. Front Chem 2022; 10:930369. [PMID: 35873055 PMCID: PMC9300917 DOI: 10.3389/fchem.2022.930369] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 06/16/2022] [Indexed: 12/01/2022] Open
Abstract
Machine learning and other artificial intelligence methods are gaining increasing prominence in chemistry and materials sciences, especially for materials design and discovery, and in data analysis of results generated by sensors and biosensors. In this paper, we present a perspective on this current use of machine learning, and discuss the prospects of the future impact of extending the use of machine learning to encompass knowledge discovery as an essential step towards a new paradigm of machine-generated knowledge. The reasons why results so far have been limited are given with a discussion of the limitations of machine learning in tasks requiring interpretation. Also discussed is the need to adapt the training of students and scientists in chemistry and materials sciences, to better explore the potential of artificial intelligence capabilities.
Collapse
Affiliation(s)
- Osvaldo N. Oliveira
- Sao Carlos Institute of Physics (IFSC), University of Sao Paulo, Sao Paulo, Brazil
- *Correspondence: Osvaldo N. Oliveira Jr,
| | | |
Collapse
|
21
|
Xue X, Huang X, Wang G. Materials genome engineering: a promising approach for the development of high-performance metal-organic frameworks. Sci Bull (Beijing) 2022; 67:1197-1200. [PMID: 36546143 DOI: 10.1016/j.scib.2022.05.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Affiliation(s)
- Xiangdong Xue
- Advanced Innovation Center for Materials Genome Engineering, School of Materials Science and Engineering, University of Science and Technology Beijing, Beijing 100083, China
| | - Xiubing Huang
- Advanced Innovation Center for Materials Genome Engineering, School of Materials Science and Engineering, University of Science and Technology Beijing, Beijing 100083, China.
| | - Ge Wang
- Advanced Innovation Center for Materials Genome Engineering, School of Materials Science and Engineering, University of Science and Technology Beijing, Beijing 100083, China; Shunde Graduate School, University of Science and Technology Beijing, Shunde 528399, China.
| |
Collapse
|