1
|
Performance of soft sensors based on stochastic configuration networks with nonnegative garrote. Neural Comput Appl 2022. [DOI: 10.1007/s00521-022-07254-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
2
|
Zhao YW, Zhang S, Ding H. Recent development of machine learning methods in sumoylation sites prediction. Curr Med Chem 2021; 29:894-907. [PMID: 34525906 DOI: 10.2174/0929867328666210915112030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2021] [Revised: 07/24/2021] [Accepted: 08/07/2021] [Indexed: 11/22/2022]
Abstract
Sumoylation of proteins is an important reversible post-translational modification of proteins and mediates a variety of cellular processes. Sumo-modified proteins can change their subcellular localization, activity and stability. In addition, it also plays an important role in various cellular processes such as transcriptional regulation and signal transduction. The abnormal sumoylation is involved in many diseases, including neurodegeneration and immune-related diseases, as well as the development of cancer. Therefore, identification of the sumoylation site (SUMO site) is fundamental to understanding their molecular mechanisms and regulatory roles. In contrast to labor-intensive and costly experimental approaches, computational prediction of sumoylation sites in silico also attracted much attention for its accuracy, convenience and speed. At present, many computational prediction models have been used to identify SUMO sites, but these contents have not been comprehensively summarized and reviewed. Therefore, the research progress of relevant models is summarized and discussed in this paper. We will briefly summarize the development of bioinformatics methods on sumoylation site prediction. We will mainly focus on the benchmark dataset construction, feature extraction, machine learning method, published results and online tools. We hope the review will provide more help for wet-experimental scholars.
Collapse
Affiliation(s)
- Yi-Wei Zhao
- School of Medicine, University of Electronic Science and Technology of China, Chengdu 610054. China
| | - Shihua Zhang
- College of Life Science and Health, Wuhan University of Science and Technology, Wuhan 430065. China
| | - Hui Ding
- School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054. China
| |
Collapse
|
3
|
Lu L, Wang D, Wang L, E L, Guo P, Li Z, Xiang J, Yang H, Li H, Yin S, Schwartz LH, Xie C, Zhao B. A quantitative imaging biomarker for predicting disease-free-survival-associated histologic subgroups in lung adenocarcinoma. Eur Radiol 2020; 30:3614-3623. [PMID: 32086583 DOI: 10.1007/s00330-020-06663-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 12/11/2019] [Accepted: 01/17/2020] [Indexed: 12/28/2022]
Abstract
OBJECTIVES Classification of histologic subgroups has significant prognostic value for lung adenocarcinoma patients who undergo surgical resection. However, clinical histopathology assessment is generally performed on only a small portion of the overall tumor from biopsy or surgery. Our objective is to identify a noninvasive quantitative imaging biomarker (QIB) for the classification of histologic subgroups in lung adenocarcinoma patients. METHODS We retrospectively collected and reviewed 1313 CT scans of patients with resected lung adenocarcinomas from two geographically distant institutions who were seen between January 2014 and October 2017. Three study cohorts, the training, internal validation, and external validation cohorts, were created, within which lung adenocarcinomas were divided into two disease-free-survival (DFS)-associated histologic subgroups, the mid/poor and good DFS groups. A comprehensive machine learning- and deep learning-based analytical system was adopted to identify reproducible QIBs and help to understand QIBs' significance. RESULTS Intensity-Skewness, a QIB quantifying tumor density distribution, was identified as the optimal biomarker for predicting histologic subgroups. Intensity-Skewness achieved high AUCs (95% CI) of 0.849(0.813,0.881), 0.820(0.781,0.856) and 0.863(0.827,0.895) on the training, internal validation, and external validation cohorts, respectively. A criterion of Intensity-Skewness ≤ 1.5, which indicated high tumor density, showed high specificity of 96% (sensitivity 46%) and 99% (sensitivity 53%) on predicting the mid/poor DFS group in the training and external validation cohorts, respectively. CONCLUSIONS A QIB derived from routinely acquired CT was able to predict lung adenocarcinoma histologic subgroups, providing a noninvasive method that could potentially benefit personalized treatment decision-making for lung cancer patients. KEY POINTS • A noninvasive imaging biomarker, Intensity-Skewness, which described the distortion of pixel-intensity distribution within lesions on CT images, was identified as a biomarker to predict disease-free-survival-associated histologic subgroups in lung adenocarcinoma. • An Intensity-Skewness of ≤ 1.5 has high specificity in predicting the mid/poor disease-free survival histologic patient group in both the training cohort and the external validation cohort. • The Intensity-Skewness is a feature that can be automatically computed with high reproducibility and robustness.
Collapse
Affiliation(s)
- Lin Lu
- Department of Radiology, Columbia University Medical Center, 710 West 168th Street, B26, New York, NY, 10032, USA
| | - Deling Wang
- Department of Radiology, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Lili Wang
- Department of Molecular Pathology, the Affiliated Hospital of Qingdao University, Qingdao University, Wutaishan Road 1677, Qingdao, 266000, Shandong, People's Republic of China
| | - Linning E
- Department of Radiology, Shanxi BETHUNE Hospital, 99 Longcheng Street, Taiyuan, 030032, Shanxi, People's Republic of China
| | - Pingzhen Guo
- Department of Radiology, Columbia University Medical Center, 710 West 168th Street, B26, New York, NY, 10032, USA
| | - Zhiming Li
- Department of Radiology, the Affiliated Hospital of Qingdao University, Qingdao University, Wutaishan Road 1677, Qingdao, 266000, Shandong, People's Republic of China
| | - Jin Xiang
- Department of Pathology, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Hao Yang
- Department of Radiology, Columbia University Medical Center, 710 West 168th Street, B26, New York, NY, 10032, USA
| | - Hui Li
- Department of Radiology, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Shaohan Yin
- Department of Radiology, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, 510060, People's Republic of China
| | - Lawrence H Schwartz
- Department of Radiology, Columbia University Medical Center, 710 West 168th Street, B26, New York, NY, 10032, USA
| | - Chuanmiao Xie
- Department of Radiology, State Key Laboratory of Oncology in South China, Collaborative Innovation Center for Cancer Medicine, Sun Yat-sen University Cancer Center, Guangzhou, 510060, People's Republic of China.
| | - Binsheng Zhao
- Department of Radiology, Columbia University Medical Center, 710 West 168th Street, B26, New York, NY, 10032, USA.
| |
Collapse
|
4
|
SUMOgo: Prediction of sumoylation sites on lysines by motif screening models and the effects of various post-translational modifications. Sci Rep 2018; 8:15512. [PMID: 30341374 PMCID: PMC6195521 DOI: 10.1038/s41598-018-33951-5] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 10/08/2018] [Indexed: 12/14/2022] Open
Abstract
Most modern tools used to predict sites of small ubiquitin-like modifier (SUMO) binding (referred to as SUMOylation) use algorithms, chemical features of the protein, and consensus motifs. However, these tools rarely consider the influence of post-translational modification (PTM) information for other sites within the same protein on the accuracy of prediction results. This study applied the Random Forest machine learning method, as well as motif screening models and a feature selection combination mechanism, to develop a SUMOylation prediction system, referred to as SUMOgo. With regard to prediction method, PTM sites were coded as new functional features in addition to structural features, such as sequence-based binary coding, encoded chemical features of proteins, and encoded secondary structure information that is important for PTM. Twenty cycles of prediction were conducted with a 1:1 combination of positive test data and random negative data. Matthew’s correlation coefficient of SUMOgo reached 0.511, which is higher than that of current commonly used tools. This study further verified the important role of PTM in SUMOgo and includes a case study on CREB binding protein (CREBBP). The website for the final tool is http://predictor.nchu.edu.tw/SUMOgo.
Collapse
|
5
|
Wu CY, Li QZ, Feng ZX. Non-coding RNA identification based on topology secondary structure and reading frame in organelle genome level. Genomics 2015; 107:9-15. [PMID: 26697761 DOI: 10.1016/j.ygeno.2015.12.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2015] [Revised: 12/08/2015] [Accepted: 12/12/2015] [Indexed: 10/22/2022]
Abstract
Non-coding RNA (ncRNA) genes make transcripts as same as the encoding genes, and ncRNAs directly function as RNAs rather than serve as blueprints for proteins. As the function of ncRNA is closely related to organelle genomes, it is desirable to explore ncRNA function by confirming its provenance. In this paper, the topology secondary structure, motif and the triplets under three reading frames are considered as parameters of ncRNAs. A method of SVM combining the increment of diversity (ID) algorithm is applied to construct the classifier. When the method is applied to the ncRNA dataset less than 80% sequence identity, the overall accuracies reach 95.57%, 96.40% in the five-fold cross-validation and the jackknife test, respectively. Further, for the independent testing dataset, the average prediction success rate of our method achieved 93.24%. The higher predictive success rates indicate that our method is very helpful for distinguishing ncRNAs from various organelle genomes.
Collapse
Affiliation(s)
- Cheng-Yan Wu
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Qian-Zhong Li
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China.
| | - Zhen-Xing Feng
- Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| |
Collapse
|
6
|
Zhang N, Zhou Y, Huang T, Zhang YC, Li BQ, Chen L, Cai YD. Discriminating between lysine sumoylation and lysine acetylation using mRMR feature selection and analysis. PLoS One 2014; 9:e107464. [PMID: 25222670 PMCID: PMC4164654 DOI: 10.1371/journal.pone.0107464] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2014] [Accepted: 08/10/2014] [Indexed: 11/18/2022] Open
Abstract
Post-translational modifications (PTMs) are crucial steps in protein synthesis and are important factors contributing to protein diversity. PTMs play important roles in the regulation of gene expression, protein stability and metabolism. Lysine residues in protein sequences have been found to be targeted for both types of PTMs: sumoylations and acetylations; however, each PTM has a different cellular role. As experimental approaches are often laborious and time consuming, it is challenging to distinguish the two types of PTMs on lysine residues using computational methods. In this study, we developed a method to discriminate between sumoylated lysine residues and acetylated residues. The method incorporated several features: PSSM conservation scores, amino acid factors, secondary structures, solvent accessibilities and disorder scores. By using the mRMR (Maximum Relevance Minimum Redundancy) method and the IFS (Incremental Feature Selection) method, an optimal feature set was selected from all of the incorporated features, with which the classifier achieved 92.14% accuracy with an MCC value of 0.7322. Analysis of the optimal feature set revealed some differences between acetylation and sumoylation. The results from our study also supported the previous finding that there exist different consensus motifs for the two types of PTMs. The results could suggest possible dominant factors governing the acetylation and sumoylation of lysine residues, shedding some light on the modification dynamics and molecular mechanisms of the two types of PTMs, and provide guidelines for experimental validations.
Collapse
Affiliation(s)
- Ning Zhang
- Department of Biomedical Engineering, Tianjin Key Lab of Biomedical Engineering Measurement, Tianjin University, Tianjin, P.R. China
| | - You Zhou
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Shanghai Jiao Tong University School of Medicine, Shanghai, P. R. China
| | - Tao Huang
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, United States of America
| | - Yu-Chao Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences and Shanghai Jiao Tong University School of Medicine, Shanghai, P. R. China
| | - Bi-Qing Li
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, P.R. China
| | - Lei Chen
- College of Information Engineering, Shanghai Maritime University, Shanghai, P.R. China
| | - Yu-Dong Cai
- Institute of Systems Biology, Shanghai University, Shanghai, P.R. China
- * E-mail:
| |
Collapse
|
7
|
Wang H, Wang M, Tan H, Li Y, Zhang Z, Song J. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection. PLoS One 2014; 9:e105902. [PMID: 25148528 PMCID: PMC4141844 DOI: 10.1371/journal.pone.0105902] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 07/25/2014] [Indexed: 01/14/2023] Open
Abstract
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.
Collapse
Affiliation(s)
- Huilin Wang
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Mingjun Wang
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Hao Tan
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
| | - Yuan Li
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
| | - Ziding Zhang
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing, China
- * E-mail: (JS); (ZZ)
| | - Jiangning Song
- National Engineering Laboratory for Industrial Enzymes and Key Laboratory of Systems Microbial Biotechnology, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, China
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, Monash University, Melbourne, Victoria, Australia
- ARC Centre of Excellence in Structural and Functional Microbial Genomics, Monash University, Melbourne, Victoria, Australia
- * E-mail: (JS); (ZZ)
| |
Collapse
|
8
|
Fu JJ, Yu YW, Lin HM, Chai JW, Chen CCC. Feature extraction and pattern classification of colorectal polyps in colonoscopic imaging. Comput Med Imaging Graph 2014; 38:267-75. [DOI: 10.1016/j.compmedimag.2013.12.009] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2013] [Revised: 11/14/2013] [Accepted: 12/16/2013] [Indexed: 11/15/2022]
|
9
|
Identification of biomarkers for esophageal squamous cell carcinoma using feature selection and decision tree methods. ScientificWorldJournal 2013; 2013:782031. [PMID: 24396308 PMCID: PMC3875100 DOI: 10.1155/2013/782031] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2013] [Accepted: 11/25/2013] [Indexed: 01/21/2023] Open
Abstract
Esophageal squamous cell cancer (ESCC) is one of the most common fatal human cancers. The identification of biomarkers for early detection could be a promising strategy to decrease mortality. Previous studies utilized microarray techniques to identify more than one hundred genes; however, it is desirable to identify a small set of biomarkers for clinical use. This study proposes a sequential forward feature selection algorithm to design decision tree models for discriminating ESCC from normal tissues. Two potential biomarkers of RUVBL1 and CNIH were identified and validated based on two public available microarray datasets. To test the discrimination ability of the two biomarkers, 17 pairs of expression profiles of ESCC and normal tissues from Taiwanese male patients were measured by using microarray techniques. The classification accuracies of the two biomarkers in all three datasets were higher than 90%. Interpretable decision tree models were constructed to analyze expression patterns of the two biomarkers. RUVBL1 was consistently overexpressed in all three datasets, although we found inconsistent CNIH expression possibly affected by the diverse major risk factors for ESCC across different areas.
Collapse
|
10
|
Ijaz A. SUMOhunt: Combining Spatial Staging between Lysine and SUMO with Random Forests to Predict SUMOylation. ISRN BIOINFORMATICS 2013; 2013:671269. [PMID: 25937950 PMCID: PMC4393069 DOI: 10.1155/2013/671269] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Accepted: 05/28/2013] [Indexed: 11/20/2022]
Abstract
Modification with SUMO protein has many key roles in eukaryotic systems which renders the identification of its target proteins and sites of considerable importance. Information regarding the SUMOylation of a protein may tell us about its subcellular localization, function, and spatial orientation. This modification occurs at particular and not all lysine residues in a given protein. In competition with biochemical means of modified-site recognition, computational methods are strong contenders in the prediction of SUMOylation-undergoing sites on proteins. In this research, physicochemical properties of amino acids retrieved from AAIndex, especially those involved in docking of modifier and target proteins and optimal presentation of target lysine, in combination with sequence information and random forest-based classifier presented in WEKA have been used to develop a prediction model, SUMOhunt, with statistics significantly better than all previous predictors. In this model 97.56% accuracy, 100% sensitivity, 94% specificity, and 0.95 MCC have been achieved which shows that proposed amino acid properties have a significant role in SUMO attachment. SUMOhunt will hence bring great reliability and efficiency in SUMOylation prediction.
Collapse
Affiliation(s)
- Amna Ijaz
- National Institute of Biotechnology and Genetic Engineering, P.O. Box 577, Jhang Road, Faisalabad, Pakistan
| |
Collapse
|
11
|
Zhou K, Ai C, Dong P, Fan X, Yang L. A novel model to predict O-glycosylation sites using a highly unbalanced dataset. Glycoconj J 2012; 29:551-64. [DOI: 10.1007/s10719-012-9434-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2012] [Revised: 07/11/2012] [Accepted: 07/17/2012] [Indexed: 10/28/2022]
|
12
|
Sukumar N, Krein MP, Embrechts MJ. Predictive cheminformatics in drug discovery: statistical modeling for analysis of micro-array and gene expression data. Methods Mol Biol 2012; 910:165-94. [PMID: 22821597 DOI: 10.1007/978-1-61779-965-5_9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The vast amounts of chemical and biological data available through robotic high-throughput assays and micro-array technologies require computational techniques for visualization, analysis, and predictive -modeling. Predictive cheminformatics and bioinformatics employ statistical methods to mine this data for hidden correlations and to retrieve molecules or genes with desirable biological activity from large databases, for the purpose of drug development. While many statistical methods are commonly employed and widely accessible, their proper use involves due consideration to data representation and preprocessing, model validation and domain of applicability estimation, similarity assessment, the nature of the structure-activity landscape, and model interpretation. This chapter seeks to review these considerations in light of the current state of the art in statistical modeling and to summarize the best practices in predictive cheminformatics.
Collapse
Affiliation(s)
- N Sukumar
- Rensselaer Exploratory Center for Cheminformatics Research and Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, NY, USA.
| | | | | |
Collapse
|
13
|
Fuzzy clustering of physicochemical and biochemical properties of amino acids. Amino Acids 2011; 43:583-94. [PMID: 21993537 PMCID: PMC3397137 DOI: 10.1007/s00726-011-1106-9] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Accepted: 09/23/2011] [Indexed: 12/03/2022]
Abstract
In this article, we categorize presently available experimental and theoretical knowledge of various physicochemical and biochemical features of amino acids, as collected in the AAindex database of known 544 amino acid (AA) indices. Previously reported 402 indices were categorized into six groups using hierarchical clustering technique and 142 were left unclustered. However, due to the increasing diversity of the database these indices are overlapping, therefore crisp clustering method may not provide optimal results. Moreover, in various large-scale bioinformatics analyses of whole proteomes, the proper selection of amino acid indices representing their biological significance is crucial for efficient and error-prone encoding of the short functional sequence motifs. In most cases, researchers perform exhaustive manual selection of the most informative indices. These two facts motivated us to analyse the widely used AA indices. The main goal of this article is twofold. First, we present a novel method of partitioning the bioinformatics data using consensus fuzzy clustering, where the recently proposed fuzzy clustering techniques are exploited. Second, we prepare three high quality subsets of all available indices. Superiority of the consensus fuzzy clustering method is demonstrated quantitatively, visually and statistically by comparing it with the previously proposed hierarchical clustered results. The processed AAindex1 database, supplementary material and the software are available at http://sysbio.icm.edu.pl/aaindex/.
Collapse
|
14
|
Bauer DC, Buske FA, Bailey TL, Bodén M. Predicting SUMOylation sites in developmental transcription factors of Drosophila melanogaster. Neurocomputing 2010. [DOI: 10.1016/j.neucom.2010.01.022] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
15
|
Predicting miRNA's target from primary structure by the nearest neighbor algorithm. Mol Divers 2009; 14:719-29. [PMID: 20041294 DOI: 10.1007/s11030-009-9216-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2009] [Accepted: 12/08/2009] [Indexed: 12/19/2022]
Abstract
We used a machine learning method, the nearest neighbor algorithm (NNA), to learn the relationship between miRNAs and their target proteins, generating a predictor which can then judge whether a new miRNA-target pair is true or not. We acquired 198 positive (true) miRNA-target pairs from Tarbase and the literature, and generated 4,888 negative (false) pairs through random combination. A 0/1 system and the frequencies of single nucleotides and di-nucleotides were used to encode miRNAs into vectors while various physicochemical parameters were used to encode the targets. The NNA was then applied, learning from these data to produce a predictor. We implemented minimum redundancy maximum relevance (mRMR) and properties forward selection (PFS) to reduce the redundancy of our encoding system, obtaining 91 most efficient properties. Finally, via the Jackknife cross-validation test, we got a positive accuracy of 69.2% and an overall accuracy of 96.0% with all the 253 properties. Besides, we got a positive accuracy of 83.8% and an overall accuracy of 97.2% with the 91 most efficient properties. A web-server for predictions is also made available at http://app3.biosino.org:8080/miRTP/index.jsp.
Collapse
|
16
|
Tell G, Quadrifoglio F, Tiribelli C, Kelley MR. The many functions of APE1/Ref-1: not only a DNA repair enzyme. Antioxid Redox Signal 2009; 11:601-20. [PMID: 18976116 PMCID: PMC2811080 DOI: 10.1089/ars.2008.2194] [Citation(s) in RCA: 383] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/18/2008] [Revised: 07/21/2008] [Accepted: 08/02/2008] [Indexed: 01/22/2023]
Abstract
APE1/Ref-1 (APE1), the mammalian ortholog of Escherichia coli Xth, and a multifunctional protein possessing both DNA repair and transcriptional regulatory activities, has a pleiotropic role in controlling cellular response to oxidative stress. APE1 is the main apurinic/apyrimidinic endonuclease in eukaryotic cells, playing a central role in the DNA base excision repair pathway of all DNA lesions (uracil, alkylated and oxidized, and abasic sites), including single-strand breaks, and has also cotranscriptional activity by modulating genes expression directly regulated by either ubiquitous (i.e., AP-1, Egr-1, NFkappa-B, p53, and HIF) and tissue specific (i.e., PEBP-2, Pax-5 and -8, and TTF-1) transcription factors. In addition, it controls the intracellular redox state by inhibiting the reactive oxygen species (ROS) production. At present, information is still inadequate regarding the molecular mechanisms responsible for the coordinated control of its several activities. Both expression and/or subcellular localization are altered in several metabolic and proliferative disorders such as in tumors and aging. Here, we have attempted to coalesce the most relevant information concerning APE1's different functions in order to shed new light and to focus current and future studies to fully understand this unique molecule that is acquiring more and more interest and translational relevance in the field of molecular medicine.
Collapse
Affiliation(s)
- Gianluca Tell
- Department of Biomedical Sciences and Technologies, University of Udine, Udine, Italy.
| | | | | | | |
Collapse
|
17
|
Schwamborn K, Knipscheer P, van Dijk E, van Dijk WJ, Sixma TK, Meloen RH, Langedijk JPM. SUMO assay with peptide arrays on solid support: insights into SUMO target sites. J Biochem 2008; 144:39-49. [PMID: 18344540 DOI: 10.1093/jb/mvn039] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
The modification of proteins by SUMO (small ubiquitin-like modifier) regulates various cellular processes. Sumoylation often occurs on a specific lysine residue within the consensus motif psiKxE/D. However, little is known about the specificity and selectivity of SUMO target sites. We describe here a SUMO assay with peptide array on solid support for the simultaneous characterization of hundreds of different SUMO target sites. This approach was used to characterize known SUMO substrates. The position of the motif within the peptide and the amino acids flanking the acceptor site affected the efficiency of SUMO modification. Interestingly, a sequence of only four amino acids, corresponding to the SUMO consensus motif without flanking amino acids, was a bona fide target site. Analysis of a peptide library for all variants of the psiKxE/D consensus motif revealed that the first and third positions in the tetrapeptide preferably contain aromatic amino acid residues. Furthermore, by adding the SUMO E3 ligase PIAS1 to the reaction mixture, we show specific enhancement of the modification of a PIAS1-dependent SUMO substrate in this system. Overall, our results demonstrate that the sumoylation assay with peptide array on solid support can be used for the high-throughput characterization of SUMO target sites, and provide new insights into the composition, selectivity and specificity of SUMO target sites.
Collapse
Affiliation(s)
- Klaus Schwamborn
- Pepscan Therapeutics BV, Zuidersluisweg 2, 8243 RC Lelystad, the Netherlands.
| | | | | | | | | | | | | |
Collapse
|
18
|
Pham LV, Zhou HJ, Lin-Lee YC, Tamayo AT, Yoshimura LC, Fu L, Darnay BG, Ford RJ. Nuclear Tumor Necrosis Factor Receptor-associated Factor 6 in Lymphoid Cells Negatively Regulates c-Myb-mediated Transactivation through Small Ubiquitin-related Modifier-1 Modification. J Biol Chem 2008; 283:5081-9. [DOI: 10.1074/jbc.m706307200] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
|
19
|
Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 2007; 36:D202-5. [PMID: 17998252 PMCID: PMC2238890 DOI: 10.1093/nar/gkm998] [Citation(s) in RCA: 702] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. We have added a collection of protein contact potentials to the AAindex as a new section. Accordingly AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid substitution matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. The database can be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.jp/dbget-bin/www_bfind?aaindex) or downloaded by anonymous FTP (ftp://ftp.genome.jp/pub/db/community/aaindex/).
Collapse
Affiliation(s)
- Shuichi Kawashima
- Laboratory of Genome Database, Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokane-dai Minato-ku Tokyo 108-8639, Japan.
| | | | | | | | | | | |
Collapse
|