Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Nguyen DD, Cang Z, Wei GW. A review of mathematical representations of biomolecular data. Phys Chem Chem Phys 2020;22:4343-4367. [PMID: 32067019 PMCID: PMC7081943 DOI: 10.1039/c9cp06554g] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

For:	Nguyen DD, Cang Z, Wei GW. A review of mathematical representations of biomolecular data. Phys Chem Chem Phys 2020;22:4343-4367. [PMID: 32067019 PMCID: PMC7081943 DOI: 10.1039/c9cp06554g] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Number

Cited by Other Article(s)

Wang Z, Brand R, Adolf-Bryfogle J, Grewal J, Qi Y, Combs SA, Golovach N, Alford R, Rangwala H, Clark PM. EGGNet, a Generalizable Geometric Deep Learning Framework for Protein Complex Pose Scoring. ACS OMEGA 2024;9:7471-7479. [PMID: 38405499 PMCID: PMC10882658 DOI: 10.1021/acsomega.3c04889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 01/19/2024] [Accepted: 01/23/2024] [Indexed: 02/27/2024]

Abstract

Computational prediction of molecule-protein interactions has been key for developing new molecules to interact with a target protein for therapeutics development. Previous work includes two independent streams of approaches: (1) predicting protein-protein interactions (PPIs) between naturally occurring proteins and (2) predicting binding affinities between proteins and small-molecule ligands [also known as drug-target interaction (DTI)]. Studying the two problems in isolation has limited the ability of these computational models to generalize across the PPI and DTI tasks, both of which ultimately involve noncovalent interactions with a protein target. In this work, we developed Equivariant Graph of Graphs neural Network (EGGNet), a geometric deep learning (GDL) framework, for molecule-protein binding predictions that can handle three types of molecules for interacting with a target protein: (1) small molecules, (2) synthetic peptides, and (3) natural proteins. EGGNet leverages a graph of graphs (GoG) representation constructed from the molecular structures at atomic resolution and utilizes a multiresolution equivariant graph neural network to learn from such representations. In addition, EGGNet leverages the underlying biophysics and makes use of both atom- and residue-level interactions, which improve EGGNet's ability to rank candidate poses from blind docking. EGGNet achieves competitive performance on both a public protein-small-molecule binding affinity prediction task (80.2% top 1 success rate on CASF-2016) and a synthetic protein interface prediction task (88.4% area under the precision-recall curve). We envision that the proposed GDL framework can generalize to many other protein interaction prediction problems, such as binding site prediction and molecular docking, helping accelerate protein engineering and structure-based drug development.

Collapse

Nicolle A, Deng S, Ihme M, Kuzhagaliyeva N, Ibrahim EA, Farooq A. Mixtures Recomposition by Neural Nets: A Multidisciplinary Overview. J Chem Inf Model 2024;64:597-620. [PMID: 38284618 DOI: 10.1021/acs.jcim.3c01633] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2024]

Chen D, Liu J, Wei GW. TopoFormer: Multiscale Topology-enabled Structure-to-Sequence Transformer for Protein-Ligand Interaction Predictions. RESEARCH SQUARE 2024:rs.3.rs-3640878. [PMID: 38405777 PMCID: PMC10889053 DOI: 10.21203/rs.3.rs-3640878/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]

Wee J, Chen J, Xia K, Wei GW. Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation. Comput Biol Med 2024;169:107918. [PMID: 38194782 PMCID: PMC10922365 DOI: 10.1016/j.compbiomed.2024.107918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2023] [Revised: 12/21/2023] [Accepted: 01/01/2024] [Indexed: 01/11/2024]

Wee J, Chen J, Xia K, Wei GW. Integration of persistent Laplacian and pre-trained transformer for protein solubility changes upon mutation. ARXIV 2023:arXiv:2310.18760v2. [PMID: 37961732 PMCID: PMC10635294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]

Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023;123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]

Abstract

Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.

Collapse

Qureshi R, Irfan M, Gondal TM, Khan S, Wu J, Hadi MU, Heymach J, Le X, Yan H, Alam T. AI in drug discovery and its clinical relevance. Heliyon 2023;9:e17575. [PMID: 37396052 PMCID: PMC10302550 DOI: 10.1016/j.heliyon.2023.e17575] [Citation(s) in RCA: 22] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 06/17/2023] [Accepted: 06/21/2023] [Indexed: 07/04/2023] Open

Zhang S, Jin Y, Liu T, Wang Q, Zhang Z, Zhao S, Shan B. SS-GNN: A Simple-Structured Graph Neural Network for Affinity Prediction. ACS OMEGA 2023;8:22496-22507. [PMID: 37396234 PMCID: PMC10308598 DOI: 10.1021/acsomega.3c00085] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Accepted: 06/01/2023] [Indexed: 07/04/2023]

Merkurjev E, Nguyen DD, Wei GW. Multiscale Laplacian Learning. APPL INTELL 2023;53:15727-15746. [PMID: 38031564 PMCID: PMC10686291 DOI: 10.1007/s10489-022-04333-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/08/2022] [Indexed: 11/29/2022]

Shen L, Feng H, Qiu Y, Wei GW. SVSBI: sequence-based virtual screening of biomolecular interactions. Commun Biol 2023;6:536. [PMID: 37202415 DOI: 10.1038/s42003-023-04866-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Accepted: 04/24/2023] [Indexed: 05/20/2023] Open

Mucllari E, Zadorozhnyy V, Ye Q, Nguyen DD. Novel Molecular Representations Using Neumann-Cayley Orthogonal Gated Recurrent Unit. J Chem Inf Model 2023;63:2656-2666. [PMID: 37075324 DOI: 10.1021/acs.jcim.2c01526] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/21/2023]

Li M, Zeng M, Zhang H, Chen H, Guan L. Biological Activity Predictions of Ligands Based on Hybrid Molecular Fingerprinting and Ensemble Learning. ACS OMEGA 2023;8:5561-5570. [PMID: 36816680 PMCID: PMC9933080 DOI: 10.1021/acsomega.2c06944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 12/23/2022] [Indexed: 06/18/2023]

Chen D, Liu J, Wu J, Wei GW, Pan F, Yau ST. Path Topology in Molecular and Materials Sciences. J Phys Chem Lett 2023;14:954-964. [PMID: 36688834 PMCID: PMC10799224 DOI: 10.1021/acs.jpclett.2c03706] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]

Zhu H, Yang J, Huang N. Assessment of the Generalization Abilities of Machine-Learning Scoring Functions for Structure-Based Virtual Screening. J Chem Inf Model 2022;62:5485-5502. [PMID: 36268980 DOI: 10.1021/acs.jcim.2c01149] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Liu J, Xia KL, Wu J, Yau SST, Wei GW. Biomolecular Topology: Modelling and Analysis. ACTA MATHEMATICA SINICA, ENGLISH SERIES 2022;38:1901-1938. [PMID: 36407804 PMCID: PMC9640850 DOI: 10.1007/s10114-022-2326-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 07/12/2022] [Indexed: 05/25/2023]

Qiu Y, Wei GW. CLADE 2.0: Evolution-Driven Cluster Learning-Assisted Directed Evolution. J Chem Inf Model 2022;62:4629-4641. [PMID: 36154171 DOI: 10.1021/acs.jcim.2c01046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]

Woodard J, Iqbal S, Mashaghi A. Circuit topology predicts pathogenicity of missense mutations. Proteins 2022;90:1634-1644. [PMID: 35394672 PMCID: PMC9543832 DOI: 10.1002/prot.26342] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 03/07/2022] [Accepted: 03/30/2022] [Indexed: 12/05/2022]

Liu X, Feng H, Wu J, Xia K. Hom-Complex-Based Machine Learning (HCML) for the Prediction of Protein-Protein Binding Affinity Changes upon Mutation. J Chem Inf Model 2022;62:3961-3969. [PMID: 36040839 DOI: 10.1021/acs.jcim.2c00580] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Gao K, Wang R, Chen J, Cheng L, Frishcosy J, Huzumi Y, Qiu Y, Schluckbier T, Wei X, Wei GW. Methodology-Centered Review of Molecular Modeling, Simulation, and Prediction of SARS-CoV-2. Chem Rev 2022;122:11287-11368. [PMID: 35594413 PMCID: PMC9159519 DOI: 10.1021/acs.chemrev.1c00965] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Yamaguchi S, Nakashima H, Moriwaki Y, Terada T, Shimizu K. Prediction of protein mononucleotide binding sites using AlphaFold2 and machine learning. Comput Biol Chem 2022;100:107744. [DOI: 10.1016/j.compbiolchem.2022.107744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 07/12/2022] [Accepted: 07/22/2022] [Indexed: 11/26/2022]

Hodge theory-based biomolecular data analysis. Sci Rep 2022;12:9699. [PMID: 35690623 PMCID: PMC9188576 DOI: 10.1038/s41598-022-12877-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 05/10/2022] [Indexed: 11/16/2022] Open

Grbić J, Wu J, Xia K, Wei GW. ASPECTS OF TOPOLOGICAL APPROACHES FOR DATA SCIENCE. FOUNDATIONS OF DATA SCIENCE (SPRINGFIELD, MO.) 2022;4:165-216. [PMID: 36712596 PMCID: PMC9881677 DOI: 10.3934/fods.2022002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]

Chen J, Wei GW. Mathematical artificial intelligence design of mutation-proof COVID-19 monoclonal antibodies. ARXIV 2022:arXiv:2204.09471v1. [PMID: 35475234 PMCID: PMC9040270] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]

Feng H, Gao K, Chen D, Shen L, Robison AJ, Ellsworth E, Wei GW. Machine Learning Analysis of Cocaine Addiction Informed by DAT, SERT, and NET-Based Interactome Networks. J Chem Theory Comput 2022;18:2703-2719. [PMID: 35294204 DOI: 10.1021/acs.jctc.2c00002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Liu X, Feng H, Wu J, Xia K. Dowker complex based machine learning (DCML) models for protein-ligand binding affinity prediction. PLoS Comput Biol 2022;18:e1009943. [PMID: 35385478 PMCID: PMC8985993 DOI: 10.1371/journal.pcbi.1009943] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Accepted: 02/21/2022] [Indexed: 11/19/2022] Open

Abstract

With the great advancements in experimental data, computational power and learning algorithms, artificial intelligence (AI) based drug design has begun to gain momentum recently. AI-based drug design has great promise to revolutionize pharmaceutical industries by significantly reducing the time and cost in drug discovery processes. However, a major issue remains for all AI-based learning model that is efficient molecular representations. Here we propose Dowker complex (DC) based molecular interaction representations and Riemann Zeta function based molecular featurization, for the first time. Molecular interactions between proteins and ligands (or others) are modeled as Dowker complexes. A multiscale representation is generated by using a filtration process, during which a series of DCs are generated at different scales. Combinatorial (Hodge) Laplacian matrices are constructed from these DCs, and the Riemann zeta functions from their spectral information can be used as molecular descriptors. To validate our models, we consider protein-ligand binding affinity prediction. Our DC-based machine learning (DCML) models, in particular, DC-based gradient boosting tree (DC-GBT), are tested on three most-commonly used datasets, i.e., including PDBbind-2007, PDBbind-2013 and PDBbind-2016, and extensively compared with other existing state-of-the-art models. It has been found that our DC-based descriptors can achieve the state-of-the-art results and have better performance than all machine learning models with traditional molecular descriptors. Our Dowker complex based machine learning models can be used in other tasks in AI-based drug design and molecular data analysis.

With the ever-increasing accumulation of chemical and biomolecular data, data-driven artificial intelligence (AI) models will usher in an era of faster, cheaper and more-efficient drug design and drug discovery. However, unlike image, text, video, audio data, molecular data from chemistry and biology, have much complicated three-dimensional structures, as well as physical and chemical properties. Efficient molecular representations and descriptors are key to the success of machine learning models in drug design. Here, we propose Dowker complex based molecular representation and Riemann Zeta function based molecular featurization, for the first time. To characterize the complicated molecular structures and interactions at the atomic level, Dowker complexes are constructed. Based on them, intrinsic mathematical invariants are derived and used as molecular descriptors, which can be further combined with machine learning and deep learning models. Our model has achieved state-of-the-art results in protein-ligand binding affinity prediction, demonstrating its great potential for other drug design and discovery problems.

Collapse

Casadio R, Martelli PL, Savojardo C. Machine learning solutions for predicting protein–protein interactions. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]

V_HH Structural Modelling Approaches: A Critical Review. Int J Mol Sci 2022;23:ijms23073721. [PMID: 35409081 PMCID: PMC8998791 DOI: 10.3390/ijms23073721] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 03/23/2022] [Accepted: 03/23/2022] [Indexed: 12/20/2022] Open

Bonidia RP, Domingues DS, Sanches DS, de Carvalho ACPLF. MathFeature: feature extraction package for DNA, RNA and protein sequences based on mathematical descriptors. Brief Bioinform 2022;23:bbab434. [PMID: 34750626 PMCID: PMC8769707 DOI: 10.1093/bib/bbab434] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 09/18/2021] [Accepted: 09/20/2021] [Indexed: 12/24/2022] Open

Chen J, Wei GW. Mathematical artificial intelligence design of mutation-proof COVID-19 monoclonal antibodies. COMMUNICATIONS IN INFORMATION AND SYSTEMS 2022;22:339-361. [PMID: 36713633 PMCID: PMC9881605 DOI: 10.4310/cis.2022.v22.n3.a3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]

WEI XIAOQI, WEI GUOWEI. HOMOTOPY CONTINUATION FOR THE SPECTRA OF PERSISTENT LAPLACIANS. FOUNDATIONS OF DATA SCIENCE (SPRINGFIELD, MO.) 2021;3:677-700. [PMID: 35822080 PMCID: PMC9273002 DOI: 10.3934/fods.2021017] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Chen D, Zheng J, Wei GW, Pan F. Extracting Predictive Representations from Hundreds of Millions of Molecules. J Phys Chem Lett 2021;12:10793-10801. [PMID: 34723543 PMCID: PMC9358546 DOI: 10.1021/acs.jpclett.1c03058] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]

Edwards P, Skruber K, Milićević N, Heidings JB, Read TA, Bubenik P, Vitriol EA. TDAExplore: Quantitative analysis of fluorescence microscopy images through topology-based machine learning. PATTERNS 2021;2:100367. [PMID: 34820649 PMCID: PMC8600226 DOI: 10.1016/j.patter.2021.100367] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 08/31/2021] [Accepted: 09/20/2021] [Indexed: 11/02/2022]

Abstract

Recent advances in machine learning have greatly enhanced automatic methods to extract information from fluorescence microscopy data. However, current machine-learning-based models can require hundreds to thousands of images to train, and the most readily accessible models classify images without describing which parts of an image contributed to classification. Here, we introduce TDAExplore, a machine learning image analysis pipeline based on topological data analysis. It can classify different types of cellular perturbations after training with only 20–30 high-resolution images and performs robustly on images from multiple subjects and microscopy modes. Using only images and whole-image labels for training, TDAExplore provides quantitative, spatial information, characterizing which image regions contribute to classification. Computational requirements to train TDAExplore models are modest and a standard PC can perform training with minimal user input. TDAExplore is therefore an accessible, powerful option for obtaining quantitative information about imaging data in a wide variety of applications.

•

TDAExplore combines topological data analysis with machine learning classification

•

As few as 20–30 high-resolution images can be used to train TDAExplore models

•

TDAExplore is robust to different microscopy modes, dataset size, image features

•

TDAExplore quantifies where and how much each image resembles the training data

Traditional intensity-based measurements of fluorescent microscopy data limit its potential to reveal new information about its sample. Here, we present an image analysis pipeline called TDAExplore, which is based on topological data analysis and machine learning classification. In addition to being highly accurate in assigning images to their correct group, TDAExplore quantifies how much images resemble the training data and identifies which parts are different, an improvement over other machine learning models that do not permit insight into how classification tasks were made. The next steps for TDAExplore will be to expand its capabilities into three-dimensional, multivariate, and time series datasets. This work represents progress into a future where machine learning identifies and describes nuanced image features in ways that allow researchers to answer important biological questions and generate new hypotheses for future studies.

Collapse

Chen J, Wang R, Wei GW. Review of the mechanisms of SARS-CoV-2 evolution and transmission. ARXIV 2021:arXiv:2109.08148v1. [PMID: 34545334 PMCID: PMC8452100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Abstract

The mechanism of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution and transmission is elusive and its understanding, a prerequisite to forecast emerging variants, is of paramount importance. SARS-CoV-2 evolution is driven by the mechanisms at molecular and organism scales and regulated by the transmission pathways at the population scale. In this review, we show that infectivity-based natural selection was discovered as the mechanism for SARS-CoV-2 evolution and transmission in July 2020. In April 2021, we proved beyond all doubt that such a natural selection via infectivity-based transmission pathway remained the sole mechanism for SARS-CoV-2 evolution. However, we reveal that antibody-disruptive co-mutations [Y449S, N501Y] on the spike protein receptor-binding domain (RBD) debuted as a new vaccine-resistant transmission pathway of viral evolution in highly vaccinated populations a few months ago. Over one year ago, we foresaw that mutations on RBD residues, 452 and 501, would "both have high chances to mutate into significantly more infectious COVID-19 strains". Mutations on these residues underpin prevailing SARS-CoV-2 variants Alpha, Beta, Gamma, Delta, Epsilon, Theta, Kappa, Lambda, and Mu at present and are expected to be vital to emerging variants in the future. We anticipate that viral evolution will combine RBD co-mutations at these two sites, creating future variants that are about ten times more infectious than the original SARS-CoV-2. Additionally, two complementary transmission pathways of viral evolution, i.e., infectivity and vaccine resistance will prolong our battle with COVID-19 for years. We predict that RBD co-mutation sets [A411S, L452R, T478K], [L452R, T478K, N501Y], [L452R, T478K, E484K, N501Y], [K417N, L452R, T478K], and [P384L, K417N, E484K, N501Y] will have a high chance to grow into dominating variants due to their high infectivity and/or strong ability to break through current vaccines, calling for the development of new vaccines and antibody therapies.

Collapse

Xiong G, Shen C, Yang Z, Jiang D, Liu S, Lu A, Chen X, Hou T, Cao D. Featurization strategies for protein–ligand interactions and their applications in scoring function development. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2021. [DOI: 10.1002/wcms.1567] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]

Chen D, Gao K, Nguyen DD, Chen X, Jiang Y, Wei GW, Pan F. Algebraic graph-assisted bidirectional transformers for molecular property prediction. Nat Commun 2021;12:3521. [PMID: 34112777 PMCID: PMC8192505 DOI: 10.1038/s41467-021-23720-w] [Citation(s) in RCA: 50] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 05/06/2021] [Indexed: 11/09/2022] Open

Qin T, Zhu Z, Wang XS, Xia J, Wu S. Computational representations of protein-ligand interfaces for structure-based virtual screening. Expert Opin Drug Discov 2021;16:1175-1192. [PMID: 34011222 DOI: 10.1080/17460441.2021.1929921] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]

Szocinski T, Nguyen DD, Wei GW. AweGNN: Auto-parametrized weighted element-specific graph neural networks for molecules. Comput Biol Med 2021;134:104460. [PMID: 34020133 DOI: 10.1016/j.compbiomed.2021.104460] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Revised: 04/23/2021] [Accepted: 04/26/2021] [Indexed: 11/29/2022]

Meng Z, Xia K. Persistent spectral-based machine learning (PerSpect ML) for protein-ligand binding affinity prediction. SCIENCE ADVANCES 2021;7:7/19/eabc5329. [PMID: 33962954 PMCID: PMC8104863 DOI: 10.1126/sciadv.abc5329] [Citation(s) in RCA: 73] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 03/18/2021] [Indexed: 05/11/2023]

Taking the leap between analytical chemistry and artificial intelligence: A tutorial review. Anal Chim Acta 2021;1161:338403. [DOI: 10.1016/j.aca.2021.338403] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Revised: 03/02/2021] [Accepted: 03/03/2021] [Indexed: 01/01/2023]

Jiang J, Wang R, Wei GW. GGL-Tox: Geometric Graph Learning for Toxicity Prediction. J Chem Inf Model 2021;61:1691-1700. [PMID: 33719422 PMCID: PMC8155789 DOI: 10.1021/acs.jcim.0c01294] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Liu X, Feng H, Wu J, Xia K. Persistent spectral hypergraph based machine learning (PSH-ML) for protein-ligand binding affinity prediction. Brief Bioinform 2021;22:6219114. [PMID: 33837771 DOI: 10.1093/bib/bbab127] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 03/14/2021] [Accepted: 03/16/2021] [Indexed: 12/21/2022] Open

Nguyen H, Kleingardner J. Identifying metal binding amino acids based on backbone geometries as a tool for metalloprotein engineering. Protein Sci 2021;30:1247-1257. [PMID: 33829594 DOI: 10.1002/pro.4074] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 04/01/2021] [Accepted: 04/02/2021] [Indexed: 01/03/2023]

Lim S, Lu Y, Cho CY, Sung I, Kim J, Kim Y, Park S, Kim S. A review on compound-protein interaction prediction methods: Data, format, representation and model. Comput Struct Biotechnol J 2021;19:1541-1556. [PMID: 33841755 PMCID: PMC8008185 DOI: 10.1016/j.csbj.2021.03.004] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 02/28/2021] [Accepted: 03/01/2021] [Indexed: 01/27/2023] Open

Jiang Y, Chen D, Chen X, Li T, Wei GW, Pan F. Topological representations of crystalline compounds for the machine-learning prediction of materials properties. NPJ COMPUTATIONAL MATERIALS 2021;7:28. [PMID: 34676106 PMCID: PMC8528346 DOI: 10.1038/s41524-021-00493-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2020] [Accepted: 01/06/2021] [Indexed: 05/19/2023]

Ariga K. Molecular recognition at the air-water interface: nanoarchitectonic design and physicochemical understanding. Phys Chem Chem Phys 2020;22:24856-24869. [PMID: 33140772 DOI: 10.1039/d0cp04174b] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Sarullo K, Matlock MK, Swamidass SJ. Site-Level Bioactivity of Small-Molecules from Deep-Learned Representations of Quantum Chemistry. J Phys Chem A 2020;124:9194-9202. [PMID: 33084331 DOI: 10.1021/acs.jpca.0c06231] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Nguyen DD, Gao K, Chen J, Wang R, Wei GW. Unveiling the molecular mechanism of SARS-CoV-2 main protease inhibition from 137 crystal structures using algebraic topology and deep learning. Chem Sci 2020;11:12036-12046. [PMID: 34123218 PMCID: PMC8162568 DOI: 10.1039/d0sc04641h] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 09/30/2020] [Indexed: 12/27/2022] Open

Scalvini B, Sheikhhassani V, Woodard J, Aupič J, Dame RT, Jerala R, Mashaghi A. Topology of Folded Molecular Chains: From Single Biomolecules to Engineered Origami. TRENDS IN CHEMISTRY 2020. [DOI: 10.1016/j.trechm.2020.04.009] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Gao K, Nguyen DD, Sresht V, Mathiowetz AM, Tu M, Wei GW. Are 2D fingerprints still valuable for drug discovery? Phys Chem Chem Phys 2020;22:8373-8390. [PMID: 32266895 PMCID: PMC7224332 DOI: 10.1039/d0cp00305k] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]

Gao K, Nguyen DD, Wang R, Wei GW. Machine intelligence design of 2019-nCoV drugs. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.01.30.927889. [PMID: 32511308 PMCID: PMC7217289 DOI: 10.1101/2020.01.30.927889] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]