1
|
Rashid S, Sundaram S, Kwoh CK. Empirical Study of Protein Feature Representation on Deep Belief Networks Trained With Small Data for Secondary Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:955-966. [PMID: 35439138 DOI: 10.1109/tcbb.2022.3168676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Protein secondary structure (SS) prediction is a classic problem of computational biology and is widely used in structural characterization and to infer homology. While most SS predictors have been trained on thousands of sequences, a previous approach had developed a compact model of training proteins that used a C-Alpha, C-Beta Side Chain (CABS)-algorithm derived energy based feature representation. Here, the previous approach is extended to Deep Belief Networks (DBN). Deep learning methods are notorious for requiring large datasets and there is a wide consensus that training deep models from scratch on small datasets, works poorly. By contrast, we demonstrate a simple DBN architecture containing a single hidden layer, trained only on the CB513 dataset. Testing on an independent set of G Switch proteins improved the Q 3 score of the previous compact model by almost 3%. The findings are further confirmed by comparison to several deep learning models which are trained on thousands of proteins. Finally, the DBN performance is also compared with Position Specific Scoring Matrix (PSSM)-profile based feature representation. The importance of (i) structural information in protein feature representation and (ii) complementary small dataset learning approaches for detection of structural fold switching are demonstrated.
Collapse
|
2
|
Akbar S, Pardasani KR, Panda NR. PSO Based Neuro-fuzzy Model for Secondary Structure Prediction of Protein. Neural Process Lett 2021. [DOI: 10.1007/s11063-021-10615-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
3
|
Krieger S, Kececioglu J. Boosting the accuracy of protein secondary structure prediction through nearest neighbor search and method hybridization. Bioinformatics 2021; 36:i317-i325. [PMID: 32657384 PMCID: PMC7355242 DOI: 10.1093/bioinformatics/btaa336] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Protein secondary structure prediction is a fundamental precursor to many bioinformatics tasks. Nearly all state-of-the-art tools when computing their secondary structure prediction do not explicitly leverage the vast number of proteins whose structure is known. Leveraging this additional information in a so-called template-based method has the potential to significantly boost prediction accuracy. METHOD We present a new hybrid approach to secondary structure prediction that gains the advantages of both template- and non-template-based methods. Our core template-based method is an algorithmic approach that uses metric-space nearest neighbor search over a template database of fixed-length amino acid words to determine estimated class-membership probabilities for each residue in the protein. These probabilities are then input to a dynamic programming algorithm that finds a physically valid maximum-likelihood prediction for the entire protein. Our hybrid approach exploits a novel accuracy estimator for our core method, which estimates the unknown true accuracy of its prediction, to discern when to switch between template- and non-template-based methods. RESULTS On challenging CASP benchmarks, the resulting hybrid approach boosts the state-of-the-art Q8 accuracy by more than 2-10%, and Q3 accuracy by more than 1-3%, yielding the most accurate method currently available for both 3- and 8-state secondary structure prediction. AVAILABILITY AND IMPLEMENTATION A preliminary implementation in a new tool we call Nnessy is available free for non-commercial use at http://nnessy.cs.arizona.edu.
Collapse
Affiliation(s)
- Spencer Krieger
- Department of Computer Science, The University of Arizona, Tucson, AZ 85721, USA
| | - John Kececioglu
- Department of Computer Science, The University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
4
|
Agrawal S, Ransom RF, Saraswathi S, Garcia-Gonzalo E, Webb A, Fernandez-Martinez JL, Popovic M, Guess AJ, Kloczkowski A, Benndorf R, Sadee W, Smoyer WE, on behalf of the Pediatric Nephrology Research Consortium (PNRC). Sulfatase 2 Is Associated with Steroid Resistance in Childhood Nephrotic Syndrome. J Clin Med 2021; 10:523. [PMID: 33540508 PMCID: PMC7867139 DOI: 10.3390/jcm10030523] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 01/20/2021] [Accepted: 01/23/2021] [Indexed: 01/17/2023] Open
Abstract
Glucocorticoid (GC) resistance complicates the treatment of ~10-20% of children with nephrotic syndrome (NS), yet the molecular basis for resistance remains unclear. We used RNAseq analysis and in silico algorithm-based approaches on peripheral blood leukocytes from 12 children both at initial NS presentation and after ~7 weeks of GC therapy to identify a 12-gene panel able to differentiate steroid resistant NS (SRNS) from steroid-sensitive NS (SSNS). Among this panel, subsequent validation and analyses of one biologically relevant candidate, sulfatase 2 (SULF2), in up to a total of 66 children, revealed that both SULF2 leukocyte expression and plasma arylsulfatase activity Post/Pre therapy ratios were greater in SSNS vs. SRNS. However, neither plasma SULF2 endosulfatase activity (measured by VEGF binding activity) nor plasma VEGF levels, distinguished SSNS from SRNS, despite VEGF's reported role as a downstream mediator of SULF2's effects in glomeruli. Experimental studies of NS-related injury in both rat glomeruli and cultured podocytes also revealed decreased SULF2 expression, which were partially reversible by GC treatment of podocytes. These findings together suggest that SULF2 levels and activity are associated with GC resistance in NS, and that SULF2 may play a protective role in NS via the modulation of downstream mediators distinct from VEGF.
Collapse
Affiliation(s)
- Shipra Agrawal
- Center for Clinical and Translational Research, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA; (R.F.R.); (M.P.); (A.J.G.); (R.B.)
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
| | - Richard F. Ransom
- Center for Clinical and Translational Research, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA; (R.F.R.); (M.P.); (A.J.G.); (R.B.)
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
| | - Saras Saraswathi
- Battelle Center for Mathematical Medicine at Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA;
| | | | - Amy Webb
- Department of Biomedical Informatics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
| | | | - Milan Popovic
- Center for Clinical and Translational Research, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA; (R.F.R.); (M.P.); (A.J.G.); (R.B.)
| | - Adam J. Guess
- Center for Clinical and Translational Research, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA; (R.F.R.); (M.P.); (A.J.G.); (R.B.)
| | - Andrzej Kloczkowski
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
- Battelle Center for Mathematical Medicine at Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA;
| | - Rainer Benndorf
- Center for Clinical and Translational Research, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA; (R.F.R.); (M.P.); (A.J.G.); (R.B.)
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
| | - Wolfgang Sadee
- Department of Cancer Biology and Genetics, Center for Pharmacogenomics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
| | - William E. Smoyer
- Center for Clinical and Translational Research, Abigail Wexner Research Institute at Nationwide Children’s Hospital, Columbus, OH 43205, USA; (R.F.R.); (M.P.); (A.J.G.); (R.B.)
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH 43210, USA;
| | | |
Collapse
|
5
|
Prediction of Protein Tertiary Structure via Regularized Template Classification Techniques. Molecules 2020; 25:molecules25112467. [PMID: 32466409 PMCID: PMC7321371 DOI: 10.3390/molecules25112467] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/21/2020] [Accepted: 05/22/2020] [Indexed: 11/24/2022] Open
Abstract
We discuss the use of the regularized linear discriminant analysis (LDA) as a model reduction technique combined with particle swarm optimization (PSO) in protein tertiary structure prediction, followed by structure refinement based on singular value decomposition (SVD) and PSO. The algorithm presented in this paper corresponds to the category of template-based modeling. The algorithm performs a preselection of protein templates before constructing a lower dimensional subspace via a regularized LDA. The protein coordinates in the reduced spaced are sampled using a highly explorative optimization algorithm, regressive–regressive PSO (RR-PSO). The obtained structure is then projected onto a reduced space via singular value decomposition and further optimized via RR-PSO to carry out a structure refinement. The final structures are similar to those predicted by best structure prediction tools, such as Rossetta and Zhang servers. The main advantage of our methodology is that alleviates the ill-posed character of protein structure prediction problems related to high dimensional optimization. It is also capable of sampling a wide range of conformational space due to the application of a regularized linear discriminant analysis, which allows us to expand the differences over a reduced basis set.
Collapse
|
6
|
Álvarez Ó, Fernández-Martínez JL, Corbeanu AC, Fernández-Muñiz Z, Kloczkowski A. Predicting protein tertiary structure and its uncertainty analysis via particle swarm sampling. J Mol Model 2019; 25:79. [PMID: 30810816 PMCID: PMC7586042 DOI: 10.1007/s00894-019-3956-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 02/05/2019] [Indexed: 10/27/2022]
Abstract
We discuss the relationship between the problem of protein tertiary structure prediction from the amino acid sequence and the uncertainty analysis. The algorithm presented in this paper belongs to the category of decoy-based modeling, where different known protein models are used to establish a low dimensional space via principal component analysis. The low dimensional space is utilized to perform an energy optimization via a family of very explorative particle swarm optimizers to find the global minimum. The aim of this procedure is to get a representative sample of the nonlinear equivalent region, that is, protein models that have their energy lower than a certain energy bound. The posterior analysis of this family provides very valuable information about the backbone structure of the native conformation and its possible alternate states. This methodology has the advantage of being simple and fast and can help refine the tertiary protein structure. We comprehensively illustrate the performance of our algorithm on one protein from the CASP-9 protein structure prediction experiment. We also provide a theoretical analysis of the energy landscape found in the tertiary structure protein inverse problem, explaining why model reduction techniques (principal component analysis in this case) serve to alleviate the ill-posed character of this high dimensional optimization problem. In addition, we expand the computational benchmark with a summary of other CASP-9 proteins in the Appendix.
Collapse
Affiliation(s)
- Óscar Álvarez
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo C. Federico García Lorca, 18, 33007, Oviedo, Spain
| | - Juan Luis Fernández-Martínez
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo C. Federico García Lorca, 18, 33007, Oviedo, Spain.
| | - Ana Cernea Corbeanu
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo C. Federico García Lorca, 18, 33007, Oviedo, Spain
| | - Zulima Fernández-Muñiz
- Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo C. Federico García Lorca, 18, 33007, Oviedo, Spain
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
7
|
Yang Y, Gao J, Wang J, Heffernan R, Hanson J, Paliwal K, Zhou Y. Sixty-five years of the long march in protein secondary structure prediction: the final stretch? Brief Bioinform 2018; 19:482-494. [PMID: 28040746 PMCID: PMC5952956 DOI: 10.1093/bib/bbw129] [Citation(s) in RCA: 89] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 11/15/2016] [Indexed: 11/13/2022] Open
Abstract
Protein secondary structure prediction began in 1951 when Pauling and Corey predicted helical and sheet conformations for protein polypeptide backbone even before the first protein structure was determined. Sixty-five years later, powerful new methods breathe new life into this field. The highest three-state accuracy without relying on structure templates is now at 82-84%, a number unthinkable just a few years ago. These improvements came from increasingly larger databases of protein sequences and structures for training, the use of template secondary structure information and more powerful deep learning techniques. As we are approaching to the theoretical limit of three-state prediction (88-90%), alternative to secondary structure prediction (prediction of backbone torsion angles and Cα-atom-based angles and torsion angles) not only has more room for further improvement but also allows direct prediction of three-dimensional fragment structures with constantly improved accuracy. About 20% of all 40-residue fragments in a database of 1199 non-redundant proteins have <6 Å root-mean-squared distance from the native conformations by SPIDER2. More powerful deep learning methods with improved capability of capturing long-range interactions begin to emerge as the next generation of techniques for secondary structure prediction. The time has come to finish off the final stretch of the long march towards protein secondary structure prediction.
Collapse
Affiliation(s)
- Yuedong Yang
- Insitute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
| | - Jianzhao Gao
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Rhys Heffernan
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Australia
| | - Yaoqi Zhou
- Insitute for Glycomics and School of Information and Communication Technology, Griffith University, Parklands Drive, Southport, QLD, Australia
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| |
Collapse
|
8
|
Álvarez Ó, Fernández-Martínez JL, Fernández-Brillet C, Cernea A, Fernández-Muñiz Z, Kloczkowski A. Principal component analysis in protein tertiary structure prediction. J Bioinform Comput Biol 2018; 16:1850005. [PMID: 29566640 DOI: 10.1142/s0219720018500051] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
We discuss applicability of principal component analysis (PCA) for protein tertiary structure prediction from amino acid sequence. The algorithm presented in this paper belongs to the category of protein refinement models and involves establishing a low-dimensional space where the sampling (and optimization) is carried out via particle swarm optimizer (PSO). The reduced space is found via PCA performed for a set of low-energy protein models previously found using different optimization techniques. A high frequency term is added into this expansion by projecting the best decoy into the PCA basis set and calculating the residual model. This term is aimed at providing high frequency details in the energy optimization. The goal of this research is to analyze how the dimensionality reduction affects the prediction capability of the PSO procedure. For that purpose, different proteins from the Critical Assessment of Techniques for Protein Structure Prediction experiments were modeled. In all the cases, both the energy of the best decoy and the distance to the native structure have decreased. Our analysis also shows how the predicted backbone structure of native conformation and of alternative low energy states varies with respect to the PCA dimensionality. Generally speaking, the reconstruction can be successfully achieved with 10 principal components and the high frequency term. We also provide a computational analysis of protein energy landscape for the inverse problem of reconstructing structure from the reduced number of principal components, showing that the dimensionality reduction alleviates the ill-posed character of this high-dimensional energy optimization problem. The procedure explained in this paper is very fast and allows testing different PCA expansions. Our results show that PSO improves the energy of the best decoy used in the PCA when the adequate number of PCA terms is considered.
Collapse
Affiliation(s)
- Óscar Álvarez
- * Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007 Oviedo, Spain
| | - Juan Luis Fernández-Martínez
- * Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007 Oviedo, Spain
| | - Celia Fernández-Brillet
- * Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007 Oviedo, Spain
| | - Ana Cernea
- * Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007 Oviedo, Spain
| | - Zulima Fernández-Muñiz
- * Group of Inverse Problems, Optimization and Machine Learning, Department of Mathematics, University of Oviedo, C. Federico García Lorca, 18, 33007 Oviedo, Spain
| | - Andrzej Kloczkowski
- † Batelle Center for Mathematical Medicine, Nationwide Children's Hospital, Columbus, OH, USA.,‡ Department of Pediatrics, The Ohio State University, Columbus, OH, USA
| |
Collapse
|
9
|
Protein secondary structure prediction: A survey of the state of the art. J Mol Graph Model 2017; 76:379-402. [DOI: 10.1016/j.jmgm.2017.07.015] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Revised: 07/14/2017] [Accepted: 07/17/2017] [Indexed: 11/21/2022]
|
10
|
Rashid S, Saraswathi S, Kloczkowski A, Sundaram S, Kolinski A. Protein secondary structure prediction using a small training set (compact model) combined with a Complex-valued neural network approach. BMC Bioinformatics 2016; 17:362. [PMID: 27618812 PMCID: PMC5020447 DOI: 10.1186/s12859-016-1209-0] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Accepted: 08/25/2016] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Protein secondary structure prediction (SSP) has been an area of intense research interest. Despite advances in recent methods conducted on large datasets, the estimated upper limit accuracy is yet to be reached. Since the predictions of SSP methods are applied as input to higher-level structure prediction pipelines, even small errors may have large perturbations in final models. Previous works relied on cross validation as an estimate of classifier accuracy. However, training on large numbers of protein chains compromises the classifier ability to generalize to new sequences. This prompts a novel approach to training and an investigation into the possible structural factors that lead to poor predictions. Here, a small group of 55 proteins termed the compact model is selected from the CB513 dataset using a heuristics-based approach. In a prior work, all sequences were represented as probability matrices of residues adopting each of Helix, Sheet and Coil states, based on energy calculations using the C-Alpha, C-Beta, Side-chain (CABS) algorithm. The functional relationship between the conformational energies computed with CABS force-field and residue states is approximated using a classifier termed the Fully Complex-valued Relaxation Network (FCRN). The FCRN is trained with the compact model proteins. RESULTS The performance of the compact model is compared with traditional cross-validated accuracies and blind-tested on a dataset of G Switch proteins, obtaining accuracies of ∼81 %. The model demonstrates better results when compared to several techniques in the literature. A comparative case study of the worst performing chain identifies hydrogen bond contacts that lead to Coil ⇔ Sheet misclassifications. Overall, mispredicted Coil residues have a higher propensity to participate in backbone hydrogen bonding than correctly predicted Coils. CONCLUSIONS The implications of these findings are: (i) the choice of training proteins is important in preserving the generalization of a classifier to predict new sequences accurately and (ii) SSP techniques sensitive in distinguishing between backbone hydrogen bonding and side-chain or water-mediated hydrogen bonding might be needed in the reduction of Coil ⇔ Sheet misclassifications.
Collapse
Affiliation(s)
- Shamima Rashid
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore, 639798 Singapore
| | - Saras Saraswathi
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, USA
- Sidra Medical and Research Center, Al Dafna, Doha, Qatar
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children’s Hospital, 700 Children’s Drive, Columbus, USA
- Department of Paediatrics, College of Medicine, The Ohio State University, 370 W. 9th Avenue, Columbus, USA
| | - Suresh Sundaram
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Ave, Singapore, 639798 Singapore
| | - Andrzej Kolinski
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, Warsaw, 02-093 Poland
| |
Collapse
|
11
|
Patel MS, Mazumdar HS. Knowledge base and neural network approach for protein secondary structure prediction. J Theor Biol 2014; 361:182-9. [DOI: 10.1016/j.jtbi.2014.08.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2013] [Revised: 08/01/2014] [Accepted: 08/04/2014] [Indexed: 10/24/2022]
|
12
|
Huang G, Huang GB, Song S, You K. Trends in extreme learning machines: a review. Neural Netw 2014; 61:32-48. [PMID: 25462632 DOI: 10.1016/j.neunet.2014.10.001] [Citation(s) in RCA: 487] [Impact Index Per Article: 44.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Revised: 08/25/2014] [Accepted: 10/02/2014] [Indexed: 01/29/2023]
Abstract
Extreme learning machine (ELM) has gained increasing interest from various research fields recently. In this review, we aim to report the current state of the theoretical research and practical advances on this subject. We first give an overview of ELM from the theoretical perspective, including the interpolation theory, universal approximation capability, and generalization ability. Then we focus on the various improvements made to ELM which further improve its stability, sparsity and accuracy under general or specific conditions. Apart from classification and regression, ELM has recently been extended for clustering, feature selection, representational learning and many other learning tasks. These newly emerging algorithms greatly expand the applications of ELM. From implementation aspect, hardware implementation and parallel computation techniques have substantially sped up the training of ELM, making it feasible for big data processing and real-time reasoning. Due to its remarkable efficiency, simplicity, and impressive generalization performance, ELM have been applied in a variety of domains, such as biomedical engineering, computer vision, system identification, and control and robotics. In this review, we try to provide a comprehensive view of these advances in ELM together with its future perspectives.
Collapse
Affiliation(s)
- Gao Huang
- Department of Automation, Tsinghua University, Beijing 100084, China.
| | | | | | | |
Collapse
|
13
|
Cartwright H, Curteanu S. Neural Networks Applied in Chemistry. II. Neuro-Evolutionary Techniques in Process Modeling and Optimization. Ind Eng Chem Res 2013. [DOI: 10.1021/ie4000954] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Hugh Cartwright
- Physical and Theoretical Chemistry
Laboratory, Oxford University, South
Parks Road, Oxford, England OX1 3QZ
| | - Silvia Curteanu
- Department of Chemical Engineering, “Gheorghe
Asachi” Technical University Iasi, Bd. Prof. dr. doc. Dimitrie Mangeron, No. 73, 700050, Iasi, Romania
| |
Collapse
|
14
|
Saraswathi S, Fernández-Martínez JL, Koliński A, Jernigan RL, Kloczkowski A. Distributions of amino acids suggest that certain residue types more effectively determine protein secondary structure. J Mol Model 2013; 19:4337-48. [PMID: 23907551 DOI: 10.1007/s00894-013-1911-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2013] [Accepted: 06/05/2013] [Indexed: 11/27/2022]
Abstract
Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold cross-validated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.
Collapse
Affiliation(s)
- S Saraswathi
- Battelle Center for Mathematical Medicine, The Research Institute at Nationwide Children's Hospital, 700 Children's Drive, Columbus, OH, USA
| | | | | | | | | |
Collapse
|
15
|
Zhou C, Hou C, Zhang Q, Wei X. Enhanced hybrid search algorithm for protein structure prediction using the 3D-HP lattice model. J Mol Model 2013; 19:3883-91. [PMID: 23824509 DOI: 10.1007/s00894-013-1907-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Accepted: 05/30/2013] [Indexed: 10/26/2022]
|