Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Barrett R, White AD. Investigating Active Learning and Meta-Learning for Iterative Peptide Design. J Chem Inf Model 2021;61:95-105. [PMID: 33350829 PMCID: PMC7842147 DOI: 10.1021/acs.jcim.0c00946] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Indexed: 01/14/2023]

For:	Barrett R, White AD. Investigating Active Learning and Meta-Learning for Iterative Peptide Design. J Chem Inf Model 2021;61:95-105. [PMID: 33350829 PMCID: PMC7842147 DOI: 10.1021/acs.jcim.0c00946] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2020] [Indexed: 01/14/2023]

Number

Cited by Other Article(s)

Lv Q, Chen G, Yang Z, Zhong W, Chen CYC. Meta-MolNet: A Cross-Domain Benchmark for Few Examples Drug Discovery. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2025;36:4849-4863. [PMID: 40038923 DOI: 10.1109/tnnls.2024.3359657] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/06/2025]

Abstract

Predicting the pharmacological activity, toxicity, and pharmacokinetic properties of molecules is a central task in drug discovery. Existing machine learning methods are transferred from one resource rich molecular property to another data scarce property in the same scaffold dataset. However, existing models may produce fragile and highly uncertain predictions for new scaffold molecules. And these models were tested on different benchmarks, which seriously affected the quality of their evaluation results. In this article, we introduce Meta-MolNet, a collection of data benchmark and algorithms, which is a standard benchmark platform for measuring model generalization and uncertainty quantification capabilities. Meta-MolNet manages a wide range of molecular datasets with high ratio of molecules/scaffolds, which often leads to more difficult data shift and generalization problems. Furthermore, we propose a graph attention network based on cross-domain meta-learning, Meta-GAT, which uses bilevel optimization to learn meta-knowledge from the scaffold family molecular dataset in the source domain. Meta-GAT benefits from meta-knowledge that reduces the requirement of sample complexity to enable reliable predictions of new scaffold molecules in the target domain through internal iteration of a few examples. We evaluate existing methods as baselines for the community, and the Meta-MolNet benchmark demonstrates the effectiveness of measuring the proposed algorithm in domain generalization and uncertainty quantification. Extensive experiments demonstrate that the Meta-GAT model has state-of-the-art domain generalization performance and robustly estimates uncertainty under few examples constraints. By publishing AI-ready data, evaluation frameworks, and baseline results, we hope to see the Meta-MolNet suite become a comprehensive resource for the AI-assisted drug discovery community. Meta-MolNet is freely accessible at https://github.com/lol88/Meta-MolNet.

Collapse

Ochoa R, Deibler K. PepFuNN: Novo Nordisk Open-Source Toolkit to Enable Peptide in Silico Analysis. J Pept Sci 2025;31:e3666. [PMID: 39777768 PMCID: PMC11706630 DOI: 10.1002/psc.3666] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2024] [Revised: 12/04/2024] [Accepted: 12/09/2024] [Indexed: 01/11/2025]

Dosajh A, Agrawal P, Chatterjee P, Priyakumar UD. Modern machine learning methods for protein property prediction. Curr Opin Struct Biol 2025;90:102990. [PMID: 39881454 DOI: 10.1016/j.sbi.2025.102990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2024] [Revised: 12/06/2024] [Accepted: 01/04/2025] [Indexed: 01/31/2025]

Kashafutdinova IM, Poyezzhayeva A, Gimadiev T, Madzhidov T. Active learning approaches in molecule pKi prediction. Mol Inform 2025;44:e202400154. [PMID: 39105614 DOI: 10.1002/minf.202400154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2024] [Revised: 06/21/2024] [Accepted: 06/23/2024] [Indexed: 08/07/2024]

Abstract

During the early stages of drug design, identifying compounds with suitable bioactivities is crucial. Given the vast array of potential drug databases, it's feasible to assay only a limited subset of candidates. The optimal method for selecting the candidates, aiming to minimize the overall number of assays, involves an active learning (AL) approach. In this work, we benchmarked a range of AL strategies with two main objectives: (1) to identify a strategy that ensures high model performance and (2) to select molecules with desired properties using minimal assays. To evaluate the different AL strategies, we employed the simulated AL workflow based on "virtual" experiments. These experiments leveraged ChEMBL datasets, which come with known biological activity values for the molecules. Furthermore, for classification tasks, we proposed the hybrid selection strategy that unified both exploration and exploitation AL strategies into a single acquisition function, defined by parameters n and c. We have also shown that popular minimal margin and maximal variance selection approaches for exploration selection correspond to minimization of the hybrid acquisition function with n=1 and 2 respectively. The balance between the exploration and exploitation strategies can be adjusted using a coefficient (c), making the optimal strategy selection straightforward. The primary strength of the hybrid selection method lies in its adaptability; it offers the flexibility to adjust the criteria for molecule selection based on the specific task by modifying the value of the contribution coefficient. Our analysis revealed that, in regression tasks, AL strategies didn't succeed at ensuring high model performance, however, they were successful in selecting molecules with desired properties using minimal number of tests. In analogous experiments in classification tasks, exploration strategy and the hybrid selection function with a constant c<1 (for n=1) and c≤0.2 (for n=2) were effective in achieving the goal of constructing a high-performance predictive model using minimal data. When searching for molecules with desired properties, exploitation, and the hybrid function with c≥1 (n=1) and c≥0.7 (n=2) demonstrated efficiency identifying molecules in fewer iterations compared to random selection method. Notably, when the hybrid function was set to an intermediate coefficient value (c=0.7), it successfully addressed both tasks simultaneously.

Collapse

Ansari M, White AD. Learning peptide properties with positive examples only. DIGITAL DISCOVERY 2024;3:977-986. [PMID: 38756224 PMCID: PMC11094695 DOI: 10.1039/d3dd00218g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 03/30/2024] [Indexed: 05/18/2024]

Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023;123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 79] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]

Abstract

Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.

Collapse

Deng H, Ding M, Wang Y, Li W, Liu G, Tang Y. ACP-MLC: A two-level prediction engine for identification of anticancer peptides and multi-label classification of their functional types. Comput Biol Med 2023;158:106844. [PMID: 37058760 DOI: 10.1016/j.compbiomed.2023.106844] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 03/09/2023] [Accepted: 03/30/2023] [Indexed: 04/07/2023]

Eastwood JRB, Weisberg EI, Katz D, Zuckermann RN, Kirshenbaum K. Guidelines for designing peptoid structures: Insights from the Peptoid Data Bank. Pept Sci (Hoboken) 2023. [DOI: 10.1002/pep2.24307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/11/2023]

Guo JL, Januszyk M, Longaker MT. Machine Learning in Tissue Engineering. Tissue Eng Part A 2023;29:2-19. [PMID: 35943870 PMCID: PMC9885550 DOI: 10.1089/ten.tea.2022.0128] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2022] [Accepted: 08/02/2022] [Indexed: 02/03/2023] Open

Zhang Y, Qiu L, Ren Y, Cheng Z, Li L, Yao S, Zhang C, Luo Z, Lu H. A meta-learning approach to improving radiation response prediction in cancers. Comput Biol Med 2022;150:106163. [PMID: 37070625 DOI: 10.1016/j.compbiomed.2022.106163] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/18/2022] [Accepted: 10/01/2022] [Indexed: 11/03/2022]

Abstract

PURPOSE

Predicting the efficacy of radiotherapy in individual patients has drawn widespread attention, but the limited sample size remains a bottleneck for utilizing high-dimensional multi-omics data to guide personalized radiotherapy. We hypothesize the recently developed meta-learning framework could address this limitation.

METHODS AND MATERIALS

By combining gene expression, DNA methylation, and clinical data of 806 patients who had received radiotherapy from The Cancer Genome Atlas (TCGA), we applied the Model-Agnostic Meta-Learning (MAML) framework to tasks consisting of pan-cancer data, to obtain the best initial parameters of a neural network for a specific cancer with smaller number of samples. The performance of meta-learning framework was compared with four traditional machine learning methods based on two training schemes, and tested on Cancer Cell Line Encyclopedia (CCLE) and Chinese Glioma Genome Atlas (CGGA) datasets. Moreover, biological significance of the models was investigated by survival analysis and feature interpretation.

RESULTS

The mean AUC (Area under the ROC Curve) [95% confidence interval] of our models across nine cancer types was 0.702 [0.691-0.713], which improved by 0.166 on average over other the four machine learning methods on two training schemes. Our models performed significantly better (p < 0.05) in seven cancer types and performed comparable to the other predictors in the rest of two cancer types. The more pan-cancer samples were used to transfer meta-knowledge, the greater the performance improved (p < 0.05). The predicted response scores that our models generated were negatively correlated with cell radiosensitivity index in four cancer types (p < 0.05), while not statistically significant in the other three cancer types. Moreover, the predicted response scores were shown to be prognostic factors in seven cancer types and eight potential radiosensitivity-related genes were identified.

CONCLUSIONS

For the first time, we established the meta-learning approach to improving individual radiation response prediction by transferring common knowledge from pan-cancer data with MAML framework. The results demonstrated the superiority, generalizability, and biological significance of our approach.

Collapse

Nguyen D, Tao L, Li Y. Integration of Machine Learning and Coarse-Grained Molecular Simulations for Polymer Materials: Physical Understandings and Molecular Design. Front Chem 2022;9:820417. [PMID: 35141207 PMCID: PMC8819075 DOI: 10.3389/fchem.2021.820417] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Accepted: 12/31/2021] [Indexed: 12/21/2022] Open

Abstract

In recent years, the synthesis of monomer sequence-defined polymers has expanded into broad-spectrum applications in biomedical, chemical, and materials science fields. Pursuing the characterization and inverse design of these polymer systems requires our fundamental understanding not only at the individual monomer level, but also considering the chain scales, such as polymer configuration, self-assembly, and phase separation. However, our accessibility to this field is still rudimentary due to the limitations of traditional design approaches, the complexity of chemical space along with the burdened cost and time issues that prevent us from unveiling the underlying monomer sequence-structure-property relationships. Fortunately, thanks to the recent advancements in molecular dynamics simulations and machine learning (ML) algorithms, the bottlenecks in the tasks of establishing the structure-function correlation of the polymer chains can be overcome. In this review, we will discuss the applications of the integration between ML techniques and coarse-grained molecular dynamics (CGMD) simulations to solve the current issues in polymer science at the chain level. In particular, we focus on the case studies in three important topics-polymeric configuration characterization, feed-forward property prediction, and inverse design-in which CGMD simulations are leveraged to generate training datasets to develop ML-based surrogate models for specific polymer systems and designs. By doing so, this computational hybridization allows us to well establish the monomer sequence-functional behavior relationship of the polymers as well as guide us toward the best polymer chain candidates for the inverse design in undiscovered chemical space with reasonable computational cost and time. Even though there are still limitations and challenges ahead in this field, we finally conclude that this CGMD/ML integration is very promising, not only in the attempt of bridging the monomeric and macroscopic characterizations of polymer materials, but also enabling further tailored designs for sequence-specific polymers with superior properties in many practical applications.

Collapse

Shekar V, Nicholas G, Ani Najeeb M, Zeile M, Yu V, Wang X, Slack D, Li Z, Nega PW, Chan E, Norquist AJ, Schrier J, Friedler SA. Active Meta-Learning for Predicting and Selecting Perovskite Crystallization Experiments. J Chem Phys 2022;156:064108. [DOI: 10.1063/5.0076636] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Tynes M, Gao W, Burrill DJ, Batista ER, Perez D, Yang P, Lubbers N. Pairwise Difference Regression: A Machine Learning Meta-algorithm for Improved Prediction and Uncertainty Quantification in Chemical Search. J Chem Inf Model 2021;61:3846-3857. [PMID: 34347460 DOI: 10.1021/acs.jcim.1c00670] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]