1
|
Medvedeva A, Teimouri H, Kolomeisky AB. Predicting Antimicrobial Activity for Untested Peptide-Based Drugs Using Collaborative Filtering and Link Prediction. J Chem Inf Model 2023. [PMID: 37307501 DOI: 10.1021/acs.jcim.3c00137] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The increase of bacterial resistance to currently available antibiotics has underlined the urgent need to develop new antibiotic drugs. Antimicrobial peptides (AMPs), alone or in combination with other peptides and/or existing antibiotics, have emerged as promising candidates for this task. However, given that there are thousands of known AMPs and an even larger number can be synthesized, it is impossible to comprehensively test all of them using standard wet lab experimental methods. These observations stimulated an application of machine-learning methods to identify promising AMPs. Currently, machine learning studies combine very different bacteria without considering bacteria-specific features or interactions with AMPs. In addition, the sparsity of current AMP data sets disqualifies the application of traditional machine-learning methods or makes the results unreliable. Here, we present a new approach, featuring neighborhood-based collaborative filtering, to predict with high accuracy a given bacteria's response to untested AMPs based on similarities between bacterial responses. Furthermore, we also developed a complementary bacteria-specific link prediction approach that can be used to visualize networks of AMP-antibiotic combinations, enabling us to propose new combinations that are likely to be effective.
Collapse
Affiliation(s)
- Angela Medvedeva
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Hamid Teimouri
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - Anatoly B Kolomeisky
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
- Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas 77005, United States
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
2
|
Robust and accurate prediction of protein-protein interactions by exploiting evolutionary information. Sci Rep 2021; 11:16910. [PMID: 34413375 PMCID: PMC8376940 DOI: 10.1038/s41598-021-96265-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 04/15/2021] [Indexed: 02/07/2023] Open
Abstract
Various biochemical functions of organisms are performed by protein-protein interactions (PPIs). Therefore, recognition of protein-protein interactions is very important for understanding most life activities, such as DNA replication and transcription, protein synthesis and secretion, signal transduction and metabolism. Although high-throughput technology makes it possible to generate large-scale PPIs data, it requires expensive cost of both time and labor, and leave a risk of high false positive rate. In order to formulate a more ingenious solution, biology community is looking for computational methods to quickly and efficiently discover massive protein interaction data. In this paper, we propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. Specifically, the protein sequence is first converted into position-specific scoring matrices (PSSMs) containing protein evolutionary information by using the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then we characterize a protein as a fixed length feature vector by applying OLPP to PSSMs. Finally, we train an RoF classifier for the purpose of identifying non-interacting and interacting protein pairs. The proposed method yielded a significantly better results than existing methods, with 90.07% and 96.09% prediction accuracy on Yeast and Human datasets. Our experiment show the proposed method can serve as a useful tool to accelerate the process of solving key problems in proteomics.
Collapse
|
3
|
Liu Z, Luo X, Wang Z. Convergence Analysis of Single Latent Factor-Dependent, Nonnegative, and Multiplicative Update-Based Nonnegative Latent Factor Models. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 2021; 32:1737-1749. [PMID: 32396106 DOI: 10.1109/tnnls.2020.2990990] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A single latent factor (LF)-dependent, nonnegative, and multiplicative update (SLF-NMU) learning algorithm is highly efficient in building a nonnegative LF (NLF) model defined on a high-dimensional and sparse (HiDS) matrix. However, convergence characteristics of such NLF models are never justified in theory. To address this issue, this study conducts rigorous convergence analysis for an SLF-NMU-based NLF model. The main idea is twofold: 1) proving that its learning objective keeps nonincreasing with its SLF-NMU-based learning rules via constructing specific auxiliary functions; and 2) proving that it converges to a stable equilibrium point with its SLF-NMU-based learning rules via analyzing the Karush-Kuhn-Tucker (KKT) conditions of its learning objective. Experimental results on ten HiDS matrices from real applications provide numerical evidence that indicates the correctness of the achieved proof.
Collapse
|
4
|
Li J, Shi X, You ZH, Yi HC, Chen Z, Lin Q, Fang M. Using Weighted Extreme Learning Machine Combined With Scale-Invariant Feature Transform to Predict Protein-Protein Interactions From Protein Evolutionary Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1546-1554. [PMID: 31940546 DOI: 10.1109/tcbb.2020.2965919] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein-Protein Interactions (PPIs) play an irreplaceable role in biological activities of organisms. Although many high-throughput methods are used to identify PPIs from different kinds of organisms, they have some shortcomings, such as high cost and time-consuming. To solve the above problems, computational methods are developed to predict PPIs. Thus, in this paper, we present a method to predict PPIs using protein sequences. First, protein sequences are transformed into Position Weight Matrix (PWM), in which Scale-Invariant Feature Transform (SIFT) algorithm is used to extract features. Then Principal Component Analysis (PCA) is applied to reduce the dimension of features. At last, Weighted Extreme Learning Machine (WELM) classifier is employed to predict PPIs and a series of evaluation results are obtained. In our method, since SIFT and WELM are used to extract features and classify respectively, we called the proposed method SIFT-WELM. When applying the proposed method on three well-known PPIs datasets of Yeast, Human and Helicobacter.pylori, the average accuracies of our method using five-fold cross validation are obtained as high as 94.83, 97.60 and 83.64 percent, respectively. In order to evaluate the proposed approach properly, we compare it with Support Vector Machine (SVM) classifier and other recent-developed methods in different aspects. Moreover, the training time of our method is greatly shortened, which is obviously superior to the previous methods, such as SVM, ACC, PCVMZM and so on.
Collapse
|
5
|
The scleractinian Agaricia undata as a new host for the coral-gall crab Opecarcinus hypostegus at Bonaire, southern Caribbean. Symbiosis 2020. [DOI: 10.1007/s13199-020-00706-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
AbstractThe Caribbean scleractinian reef coral Agaricia undata (Agariciidae) is recorded for the first time as a host of the coral-gall crab Opecarcinus hypostegus (Cryptochiridae). The identity of the crab was confirmed with the help of DNA barcoding. The association has been documented with photographs taken in situ at 25 m depth and in the laboratory. The predominantly mesophotic depth range of the host species suggests this association to be present also at greater depths. With this record, all seven Agaricia species are now listed as gall-crab hosts, together with the agariciid Helioseris cucullata. Within the phylogeny of Agariciidae, Helioseris is not closely related to Agaricia. Therefore, the association between Caribbean agariciids and their gall-crab symbionts may either have originated early in their shared evolutionary history or later as a result of host range expansion. New information on coral-associated fauna, such as what is presented here, leads to a better insight on the diversity, evolution, and ecology of coral reef biota, particularly in the Caribbean, where cryptochirids have rarely been studied.
Collapse
|
6
|
Yang F, Fan K, Song D, Lin H. Graph-based prediction of Protein-protein interactions with attributed signed graph embedding. BMC Bioinformatics 2020; 21:323. [PMID: 32693790 PMCID: PMC7372763 DOI: 10.1186/s12859-020-03646-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Accepted: 07/08/2020] [Indexed: 12/12/2022] Open
Abstract
Background Protein-protein interactions (PPIs) are central to many biological processes. Considering that the experimental methods for identifying PPIs are time-consuming and expensive, it is important to develop automated computational methods to better predict PPIs. Various machine learning methods have been proposed, including a deep learning technique which is sequence-based that has achieved promising results. However, it only focuses on sequence information while ignoring the structural information of PPI networks. Structural information of PPI networks such as their degree, position, and neighboring nodes in a graph has been proved to be informative in PPI prediction. Results Facing the challenge of representing graph information, we introduce an improved graph representation learning method. Our model can study PPI prediction based on both sequence information and graph structure. Moreover, our study takes advantage of a representation learning model and employs a graph-based deep learning method for PPI prediction, which shows superiority over existing sequence-based methods. Statistically, Our method achieves state-of-the-art accuracy of 99.15% on Human protein reference database (HPRD) dataset and also obtains best results on Database of Interacting Protein (DIP) Human, Drosophila, Escherichia coli (E. coli), and Caenorhabditis elegans (C. elegan) datasets. Conclusion Here, we introduce signed variational graph auto-encoder (S-VGAE), an improved graph representation learning method, to automatically learn to encode graph structure into low-dimensional embeddings. Experimental results demonstrate that our method outperforms other existing sequence-based methods on several datasets. We also prove the robustness of our model for very sparse networks and the generalization for a new dataset that consists of four datasets: HPRD, E.coli, C.elegan, and Drosophila.
Collapse
Affiliation(s)
- Fang Yang
- School of Computer Science and Technology, Beijing Institute of Technology, 5 South Zhongguancun Street, Haidian District, Beijing, 100081, China
| | - Kunjie Fan
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Ohio, Columbus, 43210, USA
| | - Dandan Song
- School of Computer Science and Technology, Beijing Institute of Technology, 5 South Zhongguancun Street, Haidian District, Beijing, 100081, China.
| | - Huakang Lin
- School of Computer Science and Technology, Beijing Institute of Technology, 5 South Zhongguancun Street, Haidian District, Beijing, 100081, China
| |
Collapse
|
7
|
Li P, Meng Y, Wang Y, Li J, Lam M, Wang L, Di LJ. Nuclear localization of Desmoplakin and its involvement in telomere maintenance. Int J Biol Sci 2019; 15:2350-2362. [PMID: 31595153 PMCID: PMC6775319 DOI: 10.7150/ijbs.34450] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 04/28/2019] [Indexed: 12/21/2022] Open
Abstract
The interaction between genomic DNA and protein fundamentally determines the activity and the function of DNA elements. Capturing the protein complex and identifying the proteins associated with a specific DNA locus is difficult. Herein, we employed CRISPR, the well-known gene-targeting tool in combination with the proximity-dependent labeling tool BioID to capture a specific genome locus associated proteins and to uncover the novel functions of these proteins. By applying this research tool on telomeres, we identified DSP, out of many others, as a convincing telomere binding protein validated by both biochemical and cell-biological approaches. We also provide evidence to demonstrate that the C-terminal domain of DSP is required for its binding to telomere after translocating to the nucleus mediated by NLS sequence of DSP. In addition, we found that the telomere binding of DSP is telomere length dependent as hTERT inhibition or knockdown caused a decrease of telomere length and diminished DSP binding to the telomere. Knockdown of TRF2 also negatively influenced DSP binding to the telomere. Functionally, loss of DSP resulted in the shortened telomere DNA and induced the DNA damage response and cell apoptosis. In conclusion, our studies identified DSP as a novel potential telomere binding protein and highlighted its role in protecting against telomere DNA damage and resultant cell apoptosis.
Collapse
Affiliation(s)
- Peipei Li
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau, SAR of China
| | - Yuan Meng
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau, SAR of China
| | - Yuan Wang
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau, SAR of China.,Metabolomics Core, Faculty of Health Sciences, University of Macau, Macau, SAR of China
| | - Jingjing Li
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau, SAR of China.,Metabolomics Core, Faculty of Health Sciences, University of Macau, Macau, SAR of China
| | - Manting Lam
- Metabolomics Core, Faculty of Health Sciences, University of Macau, Macau, SAR of China
| | - Li Wang
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau, SAR of China.,Metabolomics Core, Faculty of Health Sciences, University of Macau, Macau, SAR of China
| | - Li-Jun Di
- Cancer Center, Faculty of Health Sciences, University of Macau, Macau, SAR of China
| |
Collapse
|
8
|
Wang L, Wang HF, Liu SR, Yan X, Song KJ. Predicting Protein-Protein Interactions from Matrix-Based Protein Sequence Using Convolution Neural Network and Feature-Selective Rotation Forest. Sci Rep 2019; 9:9848. [PMID: 31285519 PMCID: PMC6614364 DOI: 10.1038/s41598-019-46369-4] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 06/10/2019] [Indexed: 01/09/2023] Open
Abstract
Protein is an essential component of the living organism. The prediction of protein-protein interactions (PPIs) has important implications for understanding the behavioral processes of life, preventing diseases, and developing new drugs. Although the development of high-throughput technology makes it possible to identify PPIs in large-scale biological experiments, it restricts the extensive use of experimental methods due to the constraints of time, cost, false positive rate and other conditions. Therefore, there is an urgent need for computational methods as a supplement to experimental methods to predict PPIs rapidly and accurately. In this paper, we propose a novel approach, namely CNN-FSRF, for predicting PPIs based on protein sequence by combining deep learning Convolution Neural Network (CNN) with Feature-Selective Rotation Forest (FSRF). The proposed method firstly converts the protein sequence into the Position-Specific Scoring Matrix (PSSM) containing biological evolution information, then uses CNN to objectively and efficiently extracts the deeply hidden features of the protein, and finally removes the redundant noise information by FSRF and gives the accurate prediction results. When performed on the PPIs datasets Yeast and Helicobacter pylori, CNN-FSRF achieved a prediction accuracy of 97.75% and 88.96%. To further evaluate the prediction performance, we compared CNN-FSRF with SVM and other existing methods. In addition, we also verified the performance of CNN-FSRF on independent datasets. Excellent experimental results indicate that CNN-FSRF can be used as a useful complement to biological experiments to identify protein interactions.
Collapse
Affiliation(s)
- Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China. .,Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, P.R. China.
| | - Hai-Feng Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China
| | - San-Rong Liu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China
| | - Xin Yan
- School of Foreign Languages, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China.
| | - Ke-Jian Song
- School of information engineering, JiangXi University of Science and Technology, Ganzhou, Jiangxi, 341000, P.R. China
| |
Collapse
|
9
|
Tian B, Wu X, Chen C, Qiu W, Ma Q, Yu B. Predicting protein–protein interactions by fusing various Chou's pseudo components and using wavelet denoising approach. J Theor Biol 2019; 462:329-346. [DOI: 10.1016/j.jtbi.2018.11.011] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Revised: 11/08/2018] [Accepted: 11/15/2018] [Indexed: 12/26/2022]
|
10
|
Using Two-dimensional Principal Component Analysis and Rotation Forest for Prediction of Protein-Protein Interactions. Sci Rep 2018; 8:12874. [PMID: 30150728 PMCID: PMC6110764 DOI: 10.1038/s41598-018-30694-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 07/17/2018] [Indexed: 11/09/2022] Open
Abstract
The interaction among proteins is essential in all life activities, and it is the basis of all the metabolic activities of the cells. By studying the protein-protein interactions (PPIs), people can better interpret the function of protein, decoding the phenomenon of life, especially in the design of new drugs with great practical value. Although many high-throughput techniques have been devised for large-scale detection of PPIs, these methods are still expensive and time-consuming. For this reason, there is a much-needed to develop computational methods for predicting PPIs at the entire proteome scale. In this article, we propose a new approach to predict PPIs using Rotation Forest (RF) classifier combine with matrix-based protein sequence. We apply the Position-Specific Scoring Matrix (PSSM), which contains biological evolution information, to represent protein sequences and extract the features through the two-dimensional Principal Component Analysis (2DPCA) algorithm. The descriptors are then sending to the rotation forest classifier for classification. We obtained 97.43% prediction accuracy with 94.92% sensitivity at the precision of 99.93% when the proposed method was applied to the PPIs data of yeast. To evaluate the performance of the proposed method, we compared it with other methods in the same dataset, and validate it on an independent datasets. The results obtained show that the proposed method is an appropriate and promising method for predicting PPIs.
Collapse
|
11
|
Zhan ZH, You ZH, Zhou Y, Li LP, Li ZW. Efficient Framework for Predicting ncRNA-Protein Interactions Based on Sequence Information by Deep Learning. INTELLIGENT COMPUTING THEORIES AND APPLICATION 2018:337-344. [DOI: 10.1007/978-3-319-95933-7_41] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/30/2023]
|
12
|
Kang Q, Chen X, Li S, Zhou M. A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:4263-4274. [PMID: 28113413 DOI: 10.1109/tcyb.2016.2606104] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Under-sampling is a popular data preprocessing method in dealing with class imbalance problems, with the purposes of balancing datasets to achieve a high classification rate and avoiding the bias toward majority class examples. It always uses full minority data in a training dataset. However, some noisy minority examples may reduce the performance of classifiers. In this paper, a new under-sampling scheme is proposed by incorporating a noise filter before executing resampling. In order to verify the efficiency, this scheme is implemented based on four popular under-sampling methods, i.e., Undersampling + Adaboost, RUSBoost, UnderBagging, and EasyEnsemble through benchmarks and significance analysis. Furthermore, this paper also summarizes the relationship between algorithm performance and imbalanced ratio. Experimental results indicate that the proposed scheme can improve the original undersampling-based methods with significance in terms of three popular metrics for imbalanced classification, i.e., the area under the curve, -measure, and -mean.
Collapse
|
13
|
Mirza MA, Li S, Jin L. Simultaneous learning and control of parallel Stewart platforms with unknown parameters. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.05.026] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
14
|
Li JQ, You ZH, Li X, Ming Z, Chen X. PSPEL: In Silico Prediction of Self-Interacting Proteins from Amino Acids Sequences Using Ensemble Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2017; 14:1165-1172. [PMID: 28092572 DOI: 10.1109/tcbb.2017.2649529] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Self interacting proteins (SIPs) play an important role in various aspects of the structural and functional organization of the cell. Detecting SIPs is one of the most important issues in current molecular biology. Although a large number of SIPs data has been generated by experimental methods, wet laboratory approaches are both time-consuming and costly. In addition, they yield high false negative and positive rates. Thus, there is a great need for in silico methods to predict SIPs accurately and efficiently. In this study, a new sequence-based method is proposed to predict SIPs. The evolutionary information contained in Position-Specific Scoring Matrix (PSSM) is extracted from of protein with known sequence. Then, features are fed to an ensemble classifier to distinguish the self-interacting and non-self-interacting proteins. When performed on Saccharomyces cerevisiae and Human SIPs data sets, the proposed method can achieve high accuracies of 86.86 and 91.30 percent, respectively. Our method also shows a good performance when compared with the SVM classifier and previous methods. Consequently, the proposed method can be considered to be a novel promising tool to predict SIPs.
Collapse
|
15
|
Huang J, Vendramin S, Shi L, McGinnis KM. Construction and Optimization of a Large Gene Coexpression Network in Maize Using RNA-Seq Data. PLANT PHYSIOLOGY 2017; 175:568-583. [PMID: 28768814 PMCID: PMC5580776 DOI: 10.1104/pp.17.00825] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 07/31/2017] [Indexed: 05/22/2023]
Abstract
With the emergence of massively parallel sequencing, genomewide expression data production has reached an unprecedented level. This abundance of data has greatly facilitated maize research, but may not be amenable to traditional analysis techniques that were optimized for other data types. Using publicly available data, a gene coexpression network (GCN) can be constructed and used for gene function prediction, candidate gene selection, and improving understanding of regulatory pathways. Several GCN studies have been done in maize (Zea mays), mostly using microarray datasets. To build an optimal GCN from plant materials RNA-Seq data, parameters for expression data normalization and network inference were evaluated. A comprehensive evaluation of these two parameters and a ranked aggregation strategy on network performance, using libraries from 1266 maize samples, were conducted. Three normalization methods and 10 inference methods, including six correlation and four mutual information methods, were tested. The three normalization methods had very similar performance. For network inference, correlation methods performed better than mutual information methods at some genes. Increasing sample size also had a positive effect on GCN. Aggregating single networks together resulted in improved performance compared to single networks.
Collapse
Affiliation(s)
- Ji Huang
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306
| | - Stefania Vendramin
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306
| | - Lizhen Shi
- Department of Computer Science, Florida State University, Tallahassee, Florida 32306
| | - Karen M McGinnis
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306
| |
Collapse
|
16
|
|
17
|
Li P, Li J, Wang L, Di LJ. Proximity Labeling of Interacting Proteins: Application of BioID as a Discovery Tool. Proteomics 2017; 17. [DOI: 10.1002/pmic.201700002] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Revised: 02/24/2017] [Indexed: 12/31/2022]
Affiliation(s)
- Peipei Li
- Cancer Center; Faculty of Health Sciences; University of Macau; Macau SAR of China
| | - Jingjing Li
- Cancer Center; Faculty of Health Sciences; University of Macau; Macau SAR of China
| | - Li Wang
- Cancer Center; Faculty of Health Sciences; University of Macau; Macau SAR of China
- Metabolomics Core; Faculty of Health Sciences; University of Macau; Macau SAR of China
| | - Li-Jun Di
- Cancer Center; Faculty of Health Sciences; University of Macau; Macau SAR of China
| |
Collapse
|
18
|
Zhang J, Wang Y, Wang C, Zhou M. Symmetrical Hierarchical Stochastic Searching on the Line in Informative and Deceptive Environments. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:626-635. [PMID: 28113486 DOI: 10.1109/tcyb.2016.2521859] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
A stochastic point location (SPL) problem aims to find a target parameter on a 1-D line by operating a controlled random walk and receiving information from a stochastic environment (SE). If the target parameter changes randomly, we call the parameter dynamic; otherwise static. SE can be 1) informative (p > 0.5 where p represents the probability for an environment providing a correct suggestion) and 2) deceptive (p <; 0.5). Up till now, hierarchical stochastic searching on the line (HSSL) is the most efficient algorithms to catch static or dynamic parameter in an informative environment, but unable to locate the target parameter in a deceptive environment and to recognize an environment's type (informative or deceptive). This paper presents a novel solution, named symmetrical HSSL, by extending an HSSL binary tree-based search structure to a symmetrical form. By means of this innovative way, the proposed learning mechanism is able to converge to a static or dynamic target parameter in the range of not only 0.6181 <; p <; 1, but also 0 <; p <; 0.382. Finally, the experimental results show that our scheme is efficient and feasible to solve the SPL problem in any SE.
Collapse
|
19
|
You ZH, Zhou M, Luo X, Li S. Highly Efficient Framework for Predicting Interactions Between Proteins. IEEE TRANSACTIONS ON CYBERNETICS 2017; 47:731-743. [PMID: 28113829 DOI: 10.1109/tcyb.2016.2524994] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Protein-protein interactions (PPIs) play a central role in many biological processes. Although a large amount of human PPI data has been generated by high-throughput experimental techniques, they are very limited compared to the estimated 130 000 protein interactions in humans. Hence, automatic methods for human PPI-detection are highly desired. This work proposes a novel framework, i.e., Low-rank approximation-kernel Extreme Learning Machine (LELM), for detecting human PPI from a protein's primary sequences automatically. It has three main steps: 1) mapping each protein sequence into a matrix built on all kinds of adjacent amino acids; 2) applying the low-rank approximation model to the obtained matrix to solve its lowest rank representation, which reflects its true subspace structures; and 3) utilizing a powerful kernel extreme learning machine to predict the probability for PPI based on this lowest rank representation. Experimental results on a large-scale human PPI dataset demonstrate that the proposed LELM has significant advantages in accuracy and efficiency over the state-of-art approaches. Hence, this work establishes a new and effective way for the automatic detection of PPI.
Collapse
|
20
|
Wang L, You ZH, Chen X, Li JQ, Yan X, Zhang W, Huang YA. An ensemble approach for large-scale identification of protein- protein interactions using the alignments of multiple sequences. Oncotarget 2017; 8:5149-5159. [PMID: 28029645 PMCID: PMC5354898 DOI: 10.18632/oncotarget.14103] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2016] [Accepted: 11/15/2016] [Indexed: 11/25/2022] Open
Abstract
Protein-Protein Interactions (PPI) is not only the critical component of various biological processes in cells, but also the key to understand the mechanisms leading to healthy and diseased states in organisms. However, it is time-consuming and cost-intensive to identify the interactions among proteins using biological experiments. Hence, how to develop a more efficient computational method rapidly became an attractive topic in the post-genomic era. In this paper, we propose a novel method for inference of protein-protein interactions from protein amino acids sequences only. Specifically, protein amino acids sequence is firstly transformed into Position-Specific Scoring Matrix (PSSM) generated by multiple sequences alignments; then the Pseudo PSSM is used to extract feature descriptors. Finally, ensemble Rotation Forest (RF) learning system is trained to predict and recognize PPIs based solely on protein sequence feature. When performed the proposed method on the three benchmark data sets (Yeast, H. pylori, and independent dataset) for predicting PPIs, our method can achieve good average accuracies of 98.38%, 89.75%, and 96.25%, respectively. In order to further evaluate the prediction performance, we also compare the proposed method with other methods using same benchmark data sets. The experiment results demonstrate that the proposed method consistently outperforms other state-of-the-art method. Therefore, our method is effective and robust and can be taken as a useful tool in exploring and discovering new relationships between proteins. A web server is made publicly available at the URL http://202.119.201.126:8888/PsePSSM/ for academic use.
Collapse
Affiliation(s)
- Lei Wang
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi 830011, China
| | - Xing Chen
- School of Information and Electrical Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Jian-Qiang Li
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Xin Yan
- School of Foreign Languages, Zaozhuang University, Zaozhuang, Shandong 277100, China
| | - Wei Zhang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong 277100, China
| | - Yu-An Huang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| |
Collapse
|
21
|
Lim H, Gray P, Xie L, Poleksic A. Improved genome-scale multi-target virtual screening via a novel collaborative filtering approach to cold-start problem. Sci Rep 2016; 6:38860. [PMID: 27958331 PMCID: PMC5153628 DOI: 10.1038/srep38860] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Accepted: 11/15/2016] [Indexed: 12/18/2022] Open
Abstract
Conventional one-drug-one-gene approach has been of limited success in modern drug discovery. Polypharmacology, which focuses on searching for multi-targeted drugs to perturb disease-causing networks instead of designing selective ligands to target individual proteins, has emerged as a new drug discovery paradigm. Although many methods for single-target virtual screening have been developed to improve the efficiency of drug discovery, few of these algorithms are designed for polypharmacology. Here, we present a novel theoretical framework and a corresponding algorithm for genome-scale multi-target virtual screening based on the one-class collaborative filtering technique. Our method overcomes the sparseness of the protein-chemical interaction data by means of interaction matrix weighting and dual regularization from both chemicals and proteins. While the statistical foundation behind our method is general enough to encompass genome-wide drug off-target prediction, the program is specifically tailored to find protein targets for new chemicals with little to no available interaction data. We extensively evaluate our method using a number of the most widely accepted gene-specific and cross-gene family benchmarks and demonstrate that our method outperforms other state-of-the-art algorithms for predicting the interaction of new chemicals with multiple proteins. Thus, the proposed algorithm may provide a powerful tool for multi-target drug design.
Collapse
Affiliation(s)
- Hansaim Lim
- Department of Computer Science, Hunter College, The City University of New York, New York, New York 10065, United States
| | - Paul Gray
- Department of Computer Science, University of Northern Iowa, Cedar Falls, Iowa 50614, United States
| | - Lei Xie
- Department of Computer Science, Hunter College, The City University of New York, New York, New York 10065, United States.,Ph.D. Program in Computer Science, Biochemistry and Biology, The Graduate Center, The City University of New York, New York, New York 10065, United States
| | - Aleksandar Poleksic
- Department of Computer Science, University of Northern Iowa, Cedar Falls, Iowa 50614, United States
| |
Collapse
|
22
|
Sze-To A, Fung S, Lee ESA, Wong AK. Prediction of Protein–Protein Interaction via co-occurring Aligned Pattern Clusters. Methods 2016; 110:26-34. [DOI: 10.1016/j.ymeth.2016.07.018] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2016] [Revised: 06/25/2016] [Accepted: 07/26/2016] [Indexed: 10/21/2022] Open
|
23
|
Huang XT, Zhu Y, Chan LLH, Zhao Z, Yan H. An integrative C. elegans protein-protein interaction network with reliability assessment based on a probabilistic graphical model. MOLECULAR BIOSYSTEMS 2016; 12:85-92. [PMID: 26555698 DOI: 10.1039/c5mb00417a] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
In Caenorhabditis elegans, a large number of protein-protein interactions (PPIs) are identified by different experiments. However, a comprehensive weighted PPI network, which is essential for signaling pathway inference, is not yet available in this model organism. Therefore, we firstly construct an integrative PPI network in C. elegans with 12,951 interactions involving 5039 proteins from seven molecular interaction databases. Then, a reliability score based on a probabilistic graphical model (RSPGM) is proposed to assess PPIs. It assumes that the random number of interactions between two proteins comes from the Bernoulli distribution to avoid multi-links. The main parameter of the RSPGM score contains a few latent variables which can be considered as several common properties between two proteins. Validations on high-confidence yeast datasets show that RSPGM provides more accurate evaluation than other approaches, and the PPIs in the reconstructed PPI network have higher biological relevance than that in the original network in terms of gene ontology, gene expression, essentiality and the prediction of known protein complexes. Furthermore, this weighted integrative PPI network in C. elegans is employed on inferring interaction path of the canonical Wnt/β-catenin pathway as well. Most genes on the inferred interaction path have been validated to be Wnt pathway components. Therefore, RSPGM is essential and effective for evaluating PPIs and inferring interaction path. Finally, the PPI network with RSPGM scores can be queried and visualized on a user interactive website, which is freely available at .
Collapse
Affiliation(s)
- Xiao-Tai Huang
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| | - Yuan Zhu
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China and School of Automation, China University of Geosciences, Wuhan, China.
| | - Leanne Lai Hang Chan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| | - Zhongying Zhao
- Department of Biology, Faculty of Science, Hong Kong Baptist University, Hong Kong, China
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China
| |
Collapse
|
24
|
|
25
|
Huang YA, You ZH, Gao X, Wong L, Wang L. Using Weighted Sparse Representation Model Combined with Discrete Cosine Transformation to Predict Protein-Protein Interactions from Protein Sequence. BIOMED RESEARCH INTERNATIONAL 2015; 2015:902198. [PMID: 26634213 PMCID: PMC4641304 DOI: 10.1155/2015/902198] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Accepted: 10/04/2015] [Indexed: 01/08/2023]
Abstract
Increasing demand for the knowledge about protein-protein interactions (PPIs) is promoting the development of methods for predicting protein interaction network. Although high-throughput technologies have generated considerable PPIs data for various organisms, it has inevitable drawbacks such as high cost, time consumption, and inherently high false positive rate. For this reason, computational methods are drawing more and more attention for predicting PPIs. In this study, we report a computational method for predicting PPIs using the information of protein sequences. The main improvements come from adopting a novel protein sequence representation by using discrete cosine transform (DCT) on substitution matrix representation (SMR) and from using weighted sparse representation based classifier (WSRC). When performing on the PPIs dataset of Yeast, Human, and H. pylori, we got excellent results with average accuracies as high as 96.28%, 96.30%, and 86.74%, respectively, significantly better than previous methods. Promising results obtained have proven that the proposed method is feasible, robust, and powerful. To further evaluate the proposed method, we compared it with the state-of-the-art support vector machine (SVM) classifier. Extensive experiments were also performed in which we used Yeast PPIs samples as training set to predict PPIs of other five species datasets.
Collapse
Affiliation(s)
- Yu-An Huang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Zhu-Hong You
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116, China
| | - Xin Gao
- Department of Medical Imaging, Suzhou Institute of Biomedical Engineering and Technology, Suzhou, Jiangsu 215163, China
| | - Leon Wong
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Lirong Wang
- School of Electronic and Information Engineering, Soochow University, Suzhou, Jiangsu 215123, China
| |
Collapse
|
26
|
Huang Q, You Z, Zhang X, Zhou Y. Prediction of protein-protein interactions with clustered amino acids and weighted sparse representation. Int J Mol Sci 2015; 16:10855-69. [PMID: 25984606 PMCID: PMC4463679 DOI: 10.3390/ijms160510855] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 05/06/2015] [Accepted: 05/07/2015] [Indexed: 01/22/2023] Open
Abstract
With the completion of the Human Genome Project, bioscience has entered into the era of the genome and proteome. Therefore, protein–protein interactions (PPIs) research is becoming more and more important. Life activities and the protein–protein interactions are inseparable, such as DNA synthesis, gene transcription activation, protein translation, etc. Though many methods based on biological experiments and machine learning have been proposed, they all spent a long time to learn and obtained an imprecise accuracy. How to efficiently and accurately predict PPIs is still a big challenge. To take up such a challenge, we developed a new predictor by incorporating the reduced amino acid alphabet (RAAA) information into the general form of pseudo-amino acid composition (PseAAC) and with the weighted sparse representation-based classification (WSRC). The remarkable advantages of introducing the reduced amino acid alphabet is being able to avoid the notorious dimensionality disaster or overfitting problem in statistical prediction. Additionally, experiments have proven that our method achieved good performance in both a low- and high-dimensional feature space. Among all of the experiments performed on the PPIs data of Saccharomyces cerevisiae, the best one achieved 90.91% accuracy, 94.17% sensitivity, 87.22% precision and a 83.43% Matthews correlation coefficient (MCC) value. In order to evaluate the prediction ability of our method, extensive experiments are performed to compare with the state-of-the-art technique, support vector machine (SVM). The achieved results show that the proposed approach is very promising for predicting PPIs, and it can be a helpful supplement for PPIs prediction.
Collapse
Affiliation(s)
- Qiaoying Huang
- Shenzhen Graduate School, Harbin Institute of Technology, HIT Campus of University Town of Shenzhen, Shenzhen 518055, China.
| | - Zhuhong You
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China.
| | - Xiaofeng Zhang
- Shenzhen Graduate School, Harbin Institute of Technology, HIT Campus of University Town of Shenzhen, Shenzhen 518055, China.
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China.
| |
Collapse
|
27
|
You ZH, Chan KCC, Hu P. Predicting protein-protein interactions from primary protein sequences using a novel multi-scale local feature representation scheme and the random forest. PLoS One 2015; 10:e0125811. [PMID: 25946106 PMCID: PMC4422660 DOI: 10.1371/journal.pone.0125811] [Citation(s) in RCA: 92] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2014] [Accepted: 03/04/2015] [Indexed: 11/18/2022] Open
Abstract
The study of protein-protein interactions (PPIs) can be very important for the understanding of biological cellular functions. However, detecting PPIs in the laboratories are both time-consuming and expensive. For this reason, there has been much recent effort to develop techniques for computational prediction of PPIs as this can complement laboratory procedures and provide an inexpensive way of predicting the most likely set of interactions at the entire proteome scale. Although much progress has already been achieved in this direction, the problem is still far from being solved. More effective approaches are still required to overcome the limitations of the current ones. In this study, a novel Multi-scale Local Descriptor (MLD) feature representation scheme is proposed to extract features from a protein sequence. This scheme can capture multi-scale local information by varying the length of protein-sequence segments. Based on the MLD, an ensemble learning method, the Random Forest (RF) method, is used as classifier. The MLD feature representation scheme facilitates the mining of interaction information from multi-scale continuous amino acid segments, making it easier to capture multiple overlapping continuous binding patterns within a protein sequence. When the proposed method is tested with the PPI data of Saccharomyces cerevisiae, it achieves a prediction accuracy of 94.72% with 94.34% sensitivity at the precision of 98.91%. Extensive experiments are performed to compare our method with existing sequence-based method. Experimental results show that the performance of our predictor is better than several other state-of-the-art predictors also with the H. pylori dataset. The reason why such good results are achieved can largely be credited to the learning capabilities of the RF model and the novel MLD feature representation scheme. The experiment results show that the proposed approach can be very promising for predicting PPIs and can be a useful tool for future proteomic studies.
Collapse
Affiliation(s)
- Zhu-Hong You
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China; School of Electronics and Information Engineering, Tongji University, Shanghai, China
| | - Keith C C Chan
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
| | - Pengwei Hu
- Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China
| |
Collapse
|
28
|
Detection of Protein-Protein Interactions from Amino Acid Sequences Using a Rotation Forest Model with a Novel PR-LPQ Descriptor. LECTURE NOTES IN COMPUTER SCIENCE 2015. [DOI: 10.1007/978-3-319-22053-6_75] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
|