Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Kaleel M, Torrisi M, Mooney C, Pollastri G. PaleAle 5.0: prediction of protein relative solvent accessibility by deep learning. Amino Acids 2019;51:1289-96. [PMID: 31388850 DOI: 10.1007/s00726-019-02767-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 07/29/2019] [Indexed: 10/26/2022]

For:	Kaleel M, Torrisi M, Mooney C, Pollastri G. PaleAle 5.0: prediction of protein relative solvent accessibility by deep learning. Amino Acids 2019;51:1289-96. [PMID: 31388850 DOI: 10.1007/s00726-019-02767-6] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 07/29/2019] [Indexed: 10/26/2022]

Number

Cited by Other Article(s)

Angaitkar P, Janghel RR, Sahu TP. DL-TCNN: Deep Learning-based Temporal Convolutional Neural Network for prediction of conformational B-cell epitopes. 3 Biotech 2023;13:297. [PMID: 37575599 PMCID: PMC10412510 DOI: 10.1007/s13205-023-03716-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Accepted: 07/24/2023] [Indexed: 08/15/2023] Open

Gormez Y, Aydin Z. IGPRED-MultiTask: A Deep Learning Model to Predict Protein Secondary Structure, Torsion Angles and Solvent Accessibility. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:1104-1113. [PMID: 35849663 DOI: 10.1109/tcbb.2022.3191395] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Mignon J, Mottet D, Leyder T, Uversky VN, Perpète EA, Michaux C. Structural characterisation of amyloidogenic intrinsically disordered zinc finger protein isoforms DPF3b and DPF3a. Int J Biol Macromol 2022;218:57-71. [PMID: 35863661 DOI: 10.1016/j.ijbiomac.2022.07.102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Revised: 07/08/2022] [Accepted: 07/13/2022] [Indexed: 11/05/2022]

Villalobos-Alva J, Ochoa-Toledo L, Villalobos-Alva MJ, Aliseda A, Pérez-Escamirosa F, Altamirano-Bustamante NF, Ochoa-Fernández F, Zamora-Solís R, Villalobos-Alva S, Revilla-Monsalve C, Kemper-Valverde N, Altamirano-Bustamante MM. Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field. Front Bioeng Biotechnol 2022;10:788300. [PMID: 35875501 PMCID: PMC9301016 DOI: 10.3389/fbioe.2022.788300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2021] [Accepted: 05/25/2022] [Indexed: 11/23/2022] Open

Abstract

Proteins are some of the most fascinating and challenging molecules in the universe, and they pose a big challenge for artificial intelligence. The implementation of machine learning/AI in protein science gives rise to a world of knowledge adventures in the workhorse of the cell and proteome homeostasis, which are essential for making life possible. This opens up epistemic horizons thanks to a coupling of human tacit–explicit knowledge with machine learning power, the benefits of which are already tangible, such as important advances in protein structure prediction. Moreover, the driving force behind the protein processes of self-organization, adjustment, and fitness requires a space corresponding to gigabytes of life data in its order of magnitude. There are many tasks such as novel protein design, protein folding pathways, and synthetic metabolic routes, as well as protein-aggregation mechanisms, pathogenesis of protein misfolding and disease, and proteostasis networks that are currently unexplored or unrevealed. In this systematic review and biochemical meta-analysis, we aim to contribute to bridging the gap between what we call binomial artificial intelligence (AI) and protein science (PS), a growing research enterprise with exciting and promising biotechnological and biomedical applications. We undertake our task by exploring “the state of the art” in AI and machine learning (ML) applications to protein science in the scientific literature to address some critical research questions in this domain, including What kind of tasks are already explored by ML approaches to protein sciences? What are the most common ML algorithms and databases used? What is the situational diagnostic of the AI–PS inter-field? What do ML processing steps have in common? We also formulate novel questions such as Is it possible to discover what the rules of protein evolution are with the binomial AI–PS? How do protein folding pathways evolve? What are the rules that dictate the folds? What are the minimal nuclear protein structures? How do protein aggregates form and why do they exhibit different toxicities? What are the structural properties of amyloid proteins? How can we design an effective proteostasis network to deal with misfolded proteins? We are a cross-functional group of scientists from several academic disciplines, and we have conducted the systematic review using a variant of the PICO and PRISMA approaches. The search was carried out in four databases (PubMed, Bireme, OVID, and EBSCO Web of Science), resulting in 144 research articles. After three rounds of quality screening, 93 articles were finally selected for further analysis. A summary of our findings is as follows: regarding AI applications, there are mainly four types: 1) genomics, 2) protein structure and function, 3) protein design and evolution, and 4) drug design. In terms of the ML algorithms and databases used, supervised learning was the most common approach (85%). As for the databases used for the ML models, PDB and UniprotKB/Swissprot were the most common ones (21 and 8%, respectively). Moreover, we identified that approximately 63% of the articles organized their results into three steps, which we labeled pre-process, process, and post-process. A few studies combined data from several databases or created their own databases after the pre-process. Our main finding is that, as of today, there are no research road maps serving as guides to address gaps in our knowledge of the AI–PS binomial. All research efforts to collect, integrate multidimensional data features, and then analyze and validate them are, so far, uncoordinated and scattered throughout the scientific literature without a clear epistemic goal or connection between the studies. Therefore, our main contribution to the scientific literature is to offer a road map to help solve problems in drug design, protein structures, design, and function prediction while also presenting the “state of the art” on research in the AI–PS binomial until February 2021. Thus, we pave the way toward future advances in the synthetic redesign of novel proteins and protein networks and artificial metabolic pathways, learning lessons from nature for the welfare of humankind. Many of the novel proteins and metabolic pathways are currently non-existent in nature, nor are they used in the chemical industry or biomedical field.

Collapse

Affiliation(s)

Jalil Villalobos-Alva Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Luis Ochoa-Toledo Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Mario Javier Villalobos-Alva Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Atocha Aliseda Instituto de Investigaciones Filosóficas, Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Fernando Pérez-Escamirosa Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Nelly F. Altamirano-Bustamante Instituto Nacional de Pediatría, Mexico City, Mexico
Francine Ochoa-Fernández Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Ricardo Zamora-Solís Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Sebastián Villalobos-Alva Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Cristina Revilla-Monsalve Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico
Nicolás Kemper-Valverde Instituto de Ciencias Aplicadas y Tecnología (ICAT), Universidad Nacional Autónoma de México (UNAM), Mexico City, Mexico
Myriam M. Altamirano-Bustamante Unidad de Investigación en Enfermedades Metabólicas, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Mexico City, Mexico *Correspondence: Myriam M. Altamirano-Bustamante,

Collapse

Badaczewska-Dawid AE, Uversky VN, Potoyan DA. BIAPSS: A Comprehensive Physicochemical Analyzer of Proteins Undergoing Liquid-Liquid Phase Separation. Int J Mol Sci 2022;23:ijms23116204. [PMID: 35682883 PMCID: PMC9181037 DOI: 10.3390/ijms23116204] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Revised: 05/22/2022] [Accepted: 05/27/2022] [Indexed: 02/06/2023] Open

Yu CH, Chen W, Chiang YH, Guo K, Martin Moldes Z, Kaplan DL, Buehler MJ. End-to-End Deep Learning Model to Predict and Design Secondary Structure Content of Structural Proteins. ACS Biomater Sci Eng 2022;8:1156-1165. [PMID: 35129957 PMCID: PMC9347213 DOI: 10.1021/acsbiomaterials.1c01343] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Yaqoob C, Shahid S, Khaliq A, un Nisa Z, Khan IH, Akhtar MW. Designing Fusion Molecules from Antigens of Mycobacterium tuberculosis to Enhance Serodiagnostic Sensitivity in Latent TB Infection and Active TB State. Int J Pept Res Ther 2021. [DOI: 10.1007/s10989-021-10341-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Manfredi M, Savojardo C, Martelli PL, Casadio R. DeepREx-WS: A web server for characterising protein-solvent interaction starting from sequence. Comput Struct Biotechnol J 2021;19:5791-5799. [PMID: 34765094 PMCID: PMC8566768 DOI: 10.1016/j.csbj.2021.10.016] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 10/07/2021] [Accepted: 10/07/2021] [Indexed: 11/23/2022] Open

Abstract

Protein–solvent interaction provides important features for protein surface engineering when the structure is absent or partially solved. Presently, we can integrate the notion of solvent exposed/buried residues with that of their flexibility and intrinsic disorder to highlight regions where mutations may increase or decrease protein stability in order to modify proteins for biotechnological reasons, while preserving their functional integrity. Here we describe a web server, which provides the unique possibility of integrating knowledge of solvent and non-solvent exposure with that of residue conservation, flexibility and disorder of a protein sequence, for a better understanding of which regions are relevant for protein integrity. The core of the webserver is DeepREx, a novel deep learning-based tool that classifies each residue in the sequence as buried or exposed. DeepREx is trained on a high-quality, non-redundant dataset derived from the Protein Data Bank comprising 2332 monomeric protein chains and benchmarked on a blind test set including 200 protein sequences unrelated with the training set. Results show that DeepREx performs at the state-of-the-art in the field. In turn, the Web Server, DeepREx-WS, supplements the predictions of DeepREx with features that allow a better characterisation of exposed and buried regions: i) residue conservation derived from multiple sequence alignment; ii) local sequence hydrophobicity; iii) residue flexibility computed with MEDUSA; iv) a predictor of secondary structure; v) the presence of disordered regions as derived from MobiDB-Lite3.0. The web server allows browsing, selecting and intersecting the different features. We demonstrate a possible application of the DeepREx-WS for assisting the identification of residues to be variated in protein surface engineering processes.

Collapse

Improved protein relative solvent accessibility prediction using deep multi-view feature learning framework. Anal Biochem 2021;631:114358. [PMID: 34478704 DOI: 10.1016/j.ab.2021.114358] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 08/22/2021] [Accepted: 08/25/2021] [Indexed: 11/20/2022]

Abstract

The accurate prediction of the relative solvent accessibility of a protein is critical to understanding its 3D structure and biological function. In this study, a novel deep multi-view feature learning (DMVFL) framework that integrates three different neural network units, i.e., bidirectional long short-term memory recurrent neural network, squeeze-and-excitation, and fully-connected hidden layer, with four sequence-based single-view features, i.e., position-specific scoring matrix, position-specific frequency matrix, predicted secondary structure, and roughly predicted three-state relative solvent accessibility probability, is developed to accurately predict relative solvent accessibility information of protein. On the basis of this newly developed framework, one new protein relative solvent accessibility predictor was proposed and called DMVFL-RSA, which employs a customized multiple feedback mechanism that helps to extract discriminative information embedded in the four single-view features. In benchmark tests on TEST524 and CASP14-derived (CASP14set) datasets, DMVFL-RSA outperforms other existing state-of-the-art protein relative solvent accessibility predictors when predicting two-state (exposure threshold of 25%), three-state (exposure thresholds of 9% and 36%), and four-state (exposure thresholds of 4%, 25%, and 50%) discrete values. For real-valued prediction on TEST524 and CASP14set, DMVFL-RSA has also gained high Pearson correlation coefficient values, indicating a positive correlation between the predicted and native relative solvent accessibility. Detailed analyses show that the major advantages of DMVFL-RSA lie in the high efficiency of the DMVFL framework, the applied multiple feedback mechanism, and the strong sensitivity of the sequence-based features. The web server of DMVFL-RSA is freely available at https://jun-csbio.github.io/DMVFL-RSA/for academic use. The standalone package of DMVFL-RSA is downloadable at https://github.com/XueQiangFan/DMVFL-RSA.

Collapse

Dong T, Gong T, Li W. Accurate Estimation of Solvent Accessible Surface Area for Coarse-Grained Biomolecular Structures with Deep Learning. J Phys Chem B 2021;125:9490-9498. [PMID: 34383495 DOI: 10.1021/acs.jpcb.1c05203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Suh D, Lee JW, Choi S, Lee Y. Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction. Int J Mol Sci 2021;22:6032. [PMID: 34199677 PMCID: PMC8199773 DOI: 10.3390/ijms22116032] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 05/29/2021] [Accepted: 05/29/2021] [Indexed: 01/23/2023] Open

Kaleel M, Ellinger L, Lalor C, Pollastri G, Mooney C. SCLpred-MEM: Subcellular localization prediction of membrane proteins by deep N-to-1 convolutional neural networks. Proteins 2021;89:1233-1239. [PMID: 33983651 DOI: 10.1002/prot.26144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Revised: 02/22/2021] [Accepted: 05/06/2021] [Indexed: 11/11/2022]

iEnhancer-GAN: A Deep Learning Framework in Combination with Word Embedding and Sequence Generative Adversarial Net to Identify Enhancers and Their Strength. Int J Mol Sci 2021;22:ijms22073589. [PMID: 33808317 PMCID: PMC8036415 DOI: 10.3390/ijms22073589] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 03/10/2021] [Accepted: 03/24/2021] [Indexed: 12/13/2022] Open

Abstract

As critical components of DNA, enhancers can efficiently and specifically manipulate the spatial and temporal regulation of gene transcription. Malfunction or dysregulation of enhancers is implicated in a slew of human pathology. Therefore, identifying enhancers and their strength may provide insights into the molecular mechanisms of gene transcription and facilitate the discovery of candidate drug targets. In this paper, a new enhancer and its strength predictor, iEnhancer-GAN, is proposed based on a deep learning framework in combination with the word embedding and sequence generative adversarial net (Seq-GAN). Considering the relatively small training dataset, the Seq-GAN is designed to generate artificial sequences. Given that each functional element in DNA sequences is analogous to a “word” in linguistics, the word segmentation methods are proposed to divide DNA sequences into “words”, and the skip-gram model is employed to transform the “words” into digital vectors. In view of the powerful ability to extract high-level abstraction features, a convolutional neural network (CNN) architecture is constructed to perform the identification tasks, and the word vectors of DNA sequences are vertically concatenated to form the embedding matrices as the input of the CNN. Experimental results demonstrate the effectiveness of the Seq-GAN to expand the training dataset, the possibility of applying word segmentation methods to extract “words” from DNA sequences, the feasibility of implementing the skip-gram model to encode DNA sequences, and the powerful prediction ability of the CNN. Compared with other state-of-the-art methods on the training dataset and independent test dataset, the proposed method achieves a significantly improved overall performance. It is anticipated that the proposed method has a certain promotion effect on enhancer related fields.

Collapse

Savojardo C, Manfredi M, Martelli PL, Casadio R. Solvent Accessibility of Residues Undergoing Pathogenic Variations in Humans: From Protein Structures to Protein Sequences. Front Mol Biosci 2021;7:626363. [PMID: 33490109 PMCID: PMC7817970 DOI: 10.3389/fmolb.2020.626363] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Accepted: 12/07/2020] [Indexed: 01/08/2023] Open

Torrisi M, Pollastri G. Brewery: deep learning and deeper profiles for the prediction of 1D protein structure annotations. Bioinformatics 2020;36:3897-3898. [PMID: 32207516 DOI: 10.1093/bioinformatics/btaa204] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2020] [Revised: 03/17/2020] [Accepted: 03/19/2020] [Indexed: 11/13/2022] Open

Kaleel M, Zheng Y, Chen J, Feng X, Simpson JC, Pollastri G, Mooney C. SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks. Bioinformatics 2020;36:3343-3349. [PMID: 32142105 DOI: 10.1093/bioinformatics/btaa156] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 02/25/2020] [Accepted: 03/02/2020] [Indexed: 11/14/2022] Open

de Brevern AG. Impact of protein dynamics on secondary structure prediction. Biochimie 2020;179:14-22. [PMID: 32946990 DOI: 10.1016/j.biochi.2020.09.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Revised: 09/04/2020] [Accepted: 09/10/2020] [Indexed: 02/08/2023]

Grigas AT, Mei Z, Treado JD, Levine ZA, Regan L, O'Hern CS. Using physical features of protein core packing to distinguish real proteins from decoys. Protein Sci 2020;29:1931-1944. [PMID: 32710566 PMCID: PMC7454528 DOI: 10.1002/pro.3914] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Revised: 07/10/2020] [Accepted: 07/20/2020] [Indexed: 01/06/2023]

Abstract

The ability to consistently distinguish real protein structures from computationally generated model decoys is not yet a solved problem. One route to distinguish real protein structures from decoys is to delineate the important physical features that specify a real protein. For example, it has long been appreciated that the hydrophobic cores of proteins contribute significantly to their stability. We used two sources to obtain datasets of decoys to compare with real protein structures: submissions to the biennial Critical Assessment of protein Structure Prediction competition, in which researchers attempt to predict the structure of a protein only knowing its amino acid sequence, and also decoys generated by 3DRobot, which have user-specified global root-mean-squared deviations from experimentally determined structures. Our analysis revealed that both sets of decoys possess cores that do not recapitulate the key features that define real protein cores. In particular, the model structures appear more densely packed (because of energetically unfavorable atomic overlaps), contain too few residues in the core, and have improper distributions of hydrophobic residues throughout the structure. Based on these observations, we developed a feed-forward neural network, which incorporates key physical features of protein cores, to predict how well a computational model recapitulates the real protein structure without knowledge of the structure of the target sequence. By identifying the important features of protein structure, our method is able to rank decoy structures with similar accuracy to that obtained by state-of-the-art methods that incorporate many additional features. The small number of physical features makes our model interpretable, emphasizing the importance of protein packing and hydrophobicity in protein structure prediction.

Collapse

Torrisi M, Pollastri G, Le Q. Deep learning methods in protein structure prediction. Comput Struct Biotechnol J 2020;18:1301-1310. [PMID: 32612753 PMCID: PMC7305407 DOI: 10.1016/j.csbj.2019.12.011] [Citation(s) in RCA: 102] [Impact Index Per Article: 25.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Revised: 12/19/2019] [Accepted: 12/20/2019] [Indexed: 01/01/2023] Open