Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	[Subscribe] [Scholar Register]

Number

Cited by Other Article(s)

Tetko IV, van Deursen R, Godin G. Be aware of overfitting by hyperparameter optimization! J Cheminform 2024;16:139. [PMID: 39654058 PMCID: PMC11629497 DOI: 10.1186/s13321-024-00934-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Accepted: 11/22/2024] [Indexed: 12/12/2024] Open

Ramos MC, White AD. Predicting small molecules solubility on endpoint devices using deep ensemble neural networks. DIGITAL DISCOVERY 2024;3:786-795. [PMID: 38638648 PMCID: PMC11022985 DOI: 10.1039/d3dd00217a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 03/07/2024] [Indexed: 04/20/2024]

Kim Y, Jung H, Kumar S, Paton RS, Kim S. Designing solvent systems using self-evolving solubility databases and graph neural networks. Chem Sci 2024;15:923-939. [PMID: 38239675 PMCID: PMC10793204 DOI: 10.1039/d3sc03468b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 12/04/2023] [Indexed: 01/22/2024] Open

Abstract

Designing solvent systems is key to achieving the facile synthesis and separation of desired products from chemical processes, so many machine learning models have been developed to predict solubilities. However, breakthroughs are needed to address deficiencies in the model's predictive accuracy and generalizability; this can be addressed by expanding and integrating experimental and computational solubility databases. To maximize predictive accuracy, these two databases should not be trained separately, and they should not be simply combined without reconciling the discrepancies from different magnitudes of errors and uncertainties. Here, we introduce self-evolving solubility databases and graph neural networks developed through semi-supervised self-training approaches. Solubilities from quantum-mechanical calculations are referred to during semi-supervised learning, but they are not directly added to the experimental database. Dataset augmentation is performed from 11 637 experimental solubilities to >900 000 data points in the integrated database, while correcting for the discrepancies between experiment and computation. Our model was successfully applied to study solvent selection in organic reactions and separation processes. The accuracy (mean absolute error around 0.2 kcal mol-1 for the test set) is quantitatively useful in exploring Linear Free Energy Relationships between reaction rates and solvation free energies for 11 organic reactions. Our model also accurately predicted the partition coefficients of lignin-derived monomers and drug-like molecules. While there is room for expanding solubility predictions to transition states, radicals, charged species, and organometallic complexes, this approach will be attractive to predictive chemistry areas where experimental, computational, and other heterogeneous data should be combined.

Collapse

Ahmad W, Tayara H, Shim H, Chong KT. SolPredictor: Predicting Solubility with Residual Gated Graph Neural Network. Int J Mol Sci 2024;25:715. [PMID: 38255790 PMCID: PMC10815788 DOI: 10.3390/ijms25020715] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 12/26/2023] [Accepted: 01/04/2024] [Indexed: 01/24/2024] Open

Chaka MD, Mekonnen YS, Wu Q, Geffe CA. Advancing energy storage through solubility prediction: leveraging the potential of deep learning. Phys Chem Chem Phys 2023;25:31836-31847. [PMID: 37966375 DOI: 10.1039/d3cp03992g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]

Abstract

Solubility prediction plays a crucial role in energy storage applications, such as redox flow batteries, because it directly affects the efficiency and reliability. Researchers have developed various methods that utilize quantum calculations and descriptors to predict the aqueous solubilities of organic molecules. Notably, machine learning models based on descriptors have shown promise for solubility prediction. As deep learning tools, graph neural networks (GNNs) have emerged to capture complex structure-property relationships for material property prediction. Specifically, MolGAT, a type of GNN model, was designed to incorporate n-dimensional edge attributes, enabling the modeling of intricacies in molecular graphs and enhancing the prediction capabilities. In a previous study, MolGAT successfully screened 23 467 promising redox-active molecules from a database of over 500 000 compounds, based on redox potential predictions. This study focused on applying the MolGAT model to predict the aqueous solubility (log S) of a broad range of organic compounds, including those previously screened for redox activity. The model was trained on a diverse sample of 8494 organic molecules from AqSolDB and benchmarked against literature data, demonstrating superior accuracy compared with other state of the art graph-based and descriptor-based models. Subsequently, the trained MolGAT model was employed to screen redox-active organic compounds identified in the first phase of high-throughput virtual screening, targeting favorable solubility in energy storage applications. The second round of screening, which considered solubility, yielded 12 332 promising redox-active and soluble organic molecules suitable for use in aqueous redox flow batteries. Thus, the two-phase high-throughput virtual screening approach utilizing MolGAT, specifically trained for redox potential and solubility, is an effective strategy for selecting suitable intrinsically soluble redox-active molecules from extensive databases, potentially advancing energy storage through reliable material development. This indicates that the model is reliable for predicting the solubility of various molecules and provides valuable insights for energy storage, pharmaceutical, environmental, and chemical applications.

Collapse

Reinhardt A, Chew PY, Cheng B. A streamlined molecular-dynamics workflow for computing solubilities of molecular and ionic crystals. J Chem Phys 2023;159:184110. [PMID: 37962445 DOI: 10.1063/5.0173341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Accepted: 10/20/2023] [Indexed: 11/15/2023] Open

Conn JM, Carter JW, Conn JJA, Subramanian V, Baxter A, Engkvist O, Llinas A, Ratkova EL, Pickett SD, McDonagh JL, Palmer DS. Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models. J Chem Inf Model 2023;63:1099-1113. [PMID: 36758178 PMCID: PMC9976279 DOI: 10.1021/acs.jcim.2c01189] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]

Ahmad W, Tayara H, Chong KT. Attention-Based Graph Neural Network for Molecular Solubility Prediction. ACS OMEGA 2023;8:3236-3244. [PMID: 36713733 PMCID: PMC9878542 DOI: 10.1021/acsomega.2c06702] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 12/23/2022] [Indexed: 06/18/2023]

Chew PY, Reinhardt A. Phase diagrams-Why they matter and how to predict them. J Chem Phys 2023;158:030902. [PMID: 36681642 DOI: 10.1063/5.0131028] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open

Hamre JR, Jafri MS. Optimizing peptide inhibitors of SARS-Cov-2 nsp10/nsp16 methyltransferase predicted through molecular simulation and machine learning. INFORMATICS IN MEDICINE UNLOCKED 2022;29:100886. [PMID: 35252541 PMCID: PMC8883729 DOI: 10.1016/j.imu.2022.100886] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2021] [Revised: 02/04/2022] [Accepted: 02/16/2022] [Indexed: 11/30/2022] Open

Blow KE, Quigley D, Sosso GC. The seven deadly sins: When computing crystal nucleation rates, the devil is in the details. J Chem Phys 2021;155:040901. [PMID: 34340373 DOI: 10.1063/5.0055248] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Prediction of Protein Solubility Based on Sequence Feature Fusion and DDcCNN. Interdiscip Sci 2021;13:703-716. [PMID: 34236625 DOI: 10.1007/s12539-021-00456-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 06/21/2021] [Accepted: 06/23/2021] [Indexed: 10/20/2022]

Abstract

BACKGROUND

Prediction of protein solubility is an indispensable prerequisite for pharmaceutical research and production. The general and specific objective of this work is to design a new model for predicting protein solubility by using protein sequence feature fusion and deep dual-channel convolutional neural networks (DDcCNN) to improve the performance of existing prediction models.

METHODS

The redundancy of raw protein is reduced by CD-HIT. The four subsequences are built from protein sequence: one global and three locals. The global subsequence is the entire protein sequence, and these local subsequences are obtained by moving a sliding window with some rules. Using G-gap to extract the features of the above four subsequences, a mixed matrix is constructed as the input of one channel which is composed of three-layer convolutional operating. Additional features are extracted by SCRATCH tool as input of another channel, which is consist of a single convolution in order to find hidden relationships and improve the accuracy of predictor. The outputs of two parallel channels are concatenated as the input of the hidden layer. And the prediction of protein solubility is obtained in the output layer. The best protein solubility prediction model is obtained by doing some comparative experiments of different frameworks.

RESULTS

The performance indicators of DDcCNN model (our designed) are as follows: accuracy of 77.82%, Matthew's correlation coefficient of 0.57, sensitivity of 76.13% and specificity of 79.32%. The results of some comparative experiments show that the overall performance of DDcCNN model is better than existing models (GCNN, LCNN and PCNN). The related models and data are publicly deposited at http://www.ddccnn.wang .

CONCLUSION

The satisfactory performance of DDcCNN model reveals that these features and flexible computational methodologies can reinforce the existing prediction models for better prediction of protein solubility could be applied in several applications, such as to preselect initial targets that are soluble or to alter solubility of target proteins, thus can help to reduce the production cost.

Collapse

Fowles DJ, Palmer DS, Guo R, Price SL, Mitchell JBO. Toward Physics-Based Solubility Computation for Pharmaceuticals to Rival Informatics. J Chem Theory Comput 2021;17:3700-3709. [PMID: 33988381 PMCID: PMC8190954 DOI: 10.1021/acs.jctc.1c00130] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Francoeur PG, Koes DR. SolTranNet-A Machine Learning Tool for Fast Aqueous Solubility Prediction. J Chem Inf Model 2021;61:2530-2536. [PMID: 34038123 DOI: 10.1021/acs.jcim.1c00331] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Synergistic Computational Modeling Approaches as Team Players in the Game of Solubility Predictions. J Pharm Sci 2020;110:22-34. [PMID: 33217423 DOI: 10.1016/j.xphs.2020.10.068] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2020] [Revised: 10/23/2020] [Accepted: 10/28/2020] [Indexed: 11/23/2022]

Ansari N, Karmakar T, Parrinello M. Molecular Mechanism of Gas Solubility in Liquid: Constant Chemical Potential Molecular Dynamics Simulations. J Chem Theory Comput 2020;16:5279-5286. [PMID: 32551636 DOI: 10.1021/acs.jctc.0c00450] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Wyttenbach N, Niederquell A, Kuentz M. Machine Estimation of Drug Melting Properties and Influence on Solubility Prediction. Mol Pharm 2020;17:2660-2671. [DOI: 10.1021/acs.molpharmaceut.0c00355] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Mulligan VK. The emerging role of computational design in peptide macrocycle drug discovery. Expert Opin Drug Discov 2020;15:833-852. [PMID: 32345066 DOI: 10.1080/17460441.2020.1751117] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Anwar J, Leitold C, Peters B. Solid–solid phase equilibria in the NaCl–KCl system. J Chem Phys 2020;152:144109. [DOI: 10.1063/5.0003224] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open

Abramov YA, Sun G, Zeng Q, Zeng Q, Yang M. Guiding Lead Optimization for Solubility Improvement with Physics-Based Modeling. Mol Pharm 2020;17:666-673. [DOI: 10.1021/acs.molpharmaceut.9b01138] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

Kumoro AC, Retnowati DS, Ratnawati R, Widiyanti M. Estimation of aqueous solubility of starch from various botanical sources using Flory Huggins theory approach. CHEM ENG COMMUN 2019. [DOI: 10.1080/00986445.2019.1691539] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Boothroyd S, Anwar J. Solubility prediction for a soluble organic molecule via chemical potentials from density of states. J Chem Phys 2019;151:184113. [PMID: 31731842 DOI: 10.1063/1.5117281] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open