Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Oldfield CJ, Ulrich EL, Cheng Y, Dunker AK, Markley JL. Addressing the intrinsic disorder bottleneck in structural proteomics. Proteins 2006;59:444-53. [PMID: 15789434 DOI: 10.1002/prot.20446] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

For:	Oldfield CJ, Ulrich EL, Cheng Y, Dunker AK, Markley JL. Addressing the intrinsic disorder bottleneck in structural proteomics. Proteins 2006;59:444-53. [PMID: 15789434 DOI: 10.1002/prot.20446] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Number

Cited by Other Article(s)

Structural and Functional Insights into CP2c Transcription Factor Complexes. Int J Mol Sci 2022;23:ijms23126369. [PMID: 35742810 PMCID: PMC9223585 DOI: 10.3390/ijms23126369] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/04/2022] [Accepted: 06/05/2022] [Indexed: 02/04/2023] Open

Computational Prediction of Intrinsically Disordered Proteins Based on Protein Sequences and Convolutional Neural Networks. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022;2021:4455604. [PMID: 34992646 PMCID: PMC8727116 DOI: 10.1155/2021/4455604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/06/2021] [Accepted: 12/08/2021] [Indexed: 11/17/2022]

Bondos SE, Dunker AK, Uversky VN. On the roles of intrinsically disordered proteins and regions in cell communication and signaling. Cell Commun Signal 2021;19:88. [PMID: 34461937 PMCID: PMC8404256 DOI: 10.1186/s12964-021-00774-3] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Identification of Intrinsically Disordered Protein Regions Based on Deep Neural Network-VGG16. ALGORITHMS 2021. [DOI: 10.3390/a14040107] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]

Hameduh T, Haddad Y, Adam V, Heger Z. Homology modeling in the time of collective and artificial intelligence. Comput Struct Biotechnol J 2020;18:3494-3506. [PMID: 33304450 PMCID: PMC7695898 DOI: 10.1016/j.csbj.2020.11.007] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 11/04/2020] [Accepted: 11/04/2020] [Indexed: 12/12/2022] Open

Goh GKM, Dunker AK, Foster JA, Uversky VN. A Novel Strategy for the Development of Vaccines for SARS-CoV-2 (COVID-19) and Other Viruses Using AI and Viral Shell Disorder. J Proteome Res 2020;19:4355-4363. [PMID: 33006287 PMCID: PMC7640981 DOI: 10.1021/acs.jproteome.0c00672] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2020] [Indexed: 12/29/2022]

Abstract

A model that predicts levels of coronavirus (CoV) respiratory and fecal-oral transmission potentials based on the shell disorder has been built using neural network (artificial intelligence, AI) analysis of the percentage of disorder (PID) in the nucleocapsid, N, and membrane, M, proteins of the inner and outer viral shells, respectively. Using primarily the PID of N, SARS-CoV-2 is grouped as having intermediate levels of both respiratory and fecal-oral transmission potentials. Related studies, using similar methodologies, have found strong positive correlations between virulence and inner shell disorder among numerous viruses, including Nipah, Ebola, and Dengue viruses. There is some evidence that this is also true for SARS-CoV-2 and SARS-CoV, which have N PIDs of 48% and 50%, and case-fatality rates of 0.5-5% and 10.9%, respectively. The underlying relationship between virulence and respiratory potentials has to do with the viral loads of vital organs and body fluids, respectively. Viruses can spread by respiratory means only if the viral loads in saliva and mucus exceed certain minima. Similarly, a patient is likelier to die when the viral load overwhelms vital organs. Greater disorder in inner shell proteins has been known to play important roles in the rapid replication of viruses by enhancing the efficiency pertaining to protein-protein/DNA/RNA/lipid bindings. This paper suggests a novel strategy in attenuating viruses involving comparison of disorder patterns of inner shells (N) of related viruses to identify residues and regions that could be ideal for mutation. The M protein of SARS-CoV-2 has one of the lowest M PID values (6%) in its family, and therefore, this virus has one of the hardest outer shells, which makes it resistant to antimicrobial enzymes in body fluid. While this is likely responsible for its greater contagiousness, the risks of creating an attenuated virus with a more disordered M are discussed.

Collapse

Zhou J, Oldfield CJ, Yan W, Shen B, Dunker A. Identification of Intrinsic Disorder in Complexes from the Protein Data Bank. ACS OMEGA 2020;5:17883-17891. [PMID: 32743159 PMCID: PMC7391252 DOI: 10.1021/acsomega.9b03927] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2019] [Accepted: 03/18/2020] [Indexed: 02/08/2023]

Yan J, Cheng J, Kurgan L, Uversky VN. Structural and functional analysis of "non-smelly" proteins. Cell Mol Life Sci 2020;77:2423-2440. [PMID: 31486849 PMCID: PMC11105052 DOI: 10.1007/s00018-019-03292-1] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 08/21/2019] [Accepted: 08/28/2019] [Indexed: 01/09/2023]

Ghadermarzi S, Li X, Li M, Kurgan L. Sequence-Derived Markers of Drug Targets and Potentially Druggable Human Proteins. Front Genet 2019;10:1075. [PMID: 31803227 PMCID: PMC6872670 DOI: 10.3389/fgene.2019.01075] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2019] [Accepted: 10/09/2019] [Indexed: 12/16/2022] Open

Abstract

Recent research shows that majority of the druggable human proteome is yet to be annotated and explored. Accurate identification of these unexplored druggable proteins would facilitate development, screening, repurposing, and repositioning of drugs, as well as prediction of new drug–protein interactions. We contrast the current drug targets against the datasets of non-druggable and possibly druggable proteins to formulate markers that could be used to identify druggable proteins. We focus on the markers that can be extracted from protein sequences or names/identifiers to ensure that they can be applied across the entire human proteome. These markers quantify key features covered in the past works (topological features of PPIs, cellular functions, and subcellular locations) and several novel factors (intrinsic disorder, residue-level conservation, alternative splicing isoforms, domains, and sequence-derived solvent accessibility). We find that the possibly druggable proteins have significantly higher abundance of alternative splicing isoforms, relatively large number of domains, higher degree of centrality in the protein-protein interaction networks, and lower numbers of conserved and surface residues, when compared with the non-druggable proteins. We show that the current drug targets and possibly druggable proteins share involvement in the catalytic and signaling functions. However, unlike the drug targets, the possibly druggable proteins participate in the metabolic and biosynthesis processes, are enriched in the intrinsic disorder, interact with proteins and nucleic acids, and are localized across the cell. To sum up, we formulate several markers that can help with finding novel druggable human proteins and provide interesting insights into the cellular functions and subcellular locations of the current drug targets and potentially druggable proteins.

Collapse

Dishman AF, Volkman BF. Unfolding the Mysteries of Protein Metamorphosis. ACS Chem Biol 2018;13:1438-1446. [PMID: 29787234 PMCID: PMC6007232 DOI: 10.1021/acschembio.8b00276] [Citation(s) in RCA: 57] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Meng F, Murray GF, Kurgan L, Donahue HJ. Functional and structural characterization of osteocytic MLO-Y4 cell proteins encoded by genes differentially expressed in response to mechanical signals in vitro. Sci Rep 2018;8:6716. [PMID: 29712973 PMCID: PMC5928037 DOI: 10.1038/s41598-018-25113-4] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Accepted: 04/09/2018] [Indexed: 12/29/2022] Open

Gao J, Wu Z, Hu G, Wang K, Song J, Joachimiak A, Kurgan L. Survey of Predictors of Propensity for Protein Production and Crystallization with Application to Predict Resolution of Crystal Structures. Curr Protein Pept Sci 2018;19:200-210. [PMID: 28933304 PMCID: PMC7001581 DOI: 10.2174/1389203718666170921114437] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2017] [Revised: 09/14/2017] [Accepted: 09/14/2017] [Indexed: 11/22/2022]

Ereño-Orbea J, Sicard T, Cui H, Carson J, Hermans P, Julien JP. Structural Basis of Enhanced Crystallizability Induced by a Molecular Chaperone for Antibody Antigen-Binding Fragments. J Mol Biol 2017;430:322-336. [PMID: 29277294 DOI: 10.1016/j.jmb.2017.12.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2017] [Revised: 11/30/2017] [Accepted: 12/13/2017] [Indexed: 12/20/2022]

Halliwell LM, Jathoul AP, Bate JP, Worthy HL, Anderson JC, Jones DD, Murray JAH. ΔFlucs: Brighter Photinus pyralis firefly luciferases identified by surveying consecutive single amino acid deletion mutations in a thermostable variant. Biotechnol Bioeng 2017;115:50-59. [PMID: 28921549 DOI: 10.1002/bit.26451] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2017] [Revised: 09/08/2017] [Accepted: 09/11/2017] [Indexed: 11/05/2022]

Comparative analysis of amino acid composition in the active site of nirk gene encoding copper-containing nitrite reductase (CuNiR) in bacterial spp. Comput Biol Chem 2016;67:102-113. [PMID: 28068515 DOI: 10.1016/j.compbiolchem.2016.12.011] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2015] [Revised: 06/13/2016] [Accepted: 12/29/2016] [Indexed: 11/22/2022]

Rahman KS, Chowdhury EU, Sachse K, Kaltenboeck B. Inadequate Reference Datasets Biased toward Short Non-epitopes Confound B-cell Epitope Prediction. J Biol Chem 2016;291:14585-99. [PMID: 27189949 PMCID: PMC4938180 DOI: 10.1074/jbc.m116.729020] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2016] [Revised: 05/03/2016] [Indexed: 11/06/2022] Open

Punta M, Simon I, Dosztányi Z. Prediction and analysis of intrinsically disordered proteins. Methods Mol Biol 2015;1261:35-59. [PMID: 25502193 DOI: 10.1007/978-1-4939-2230-7_3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]

Dunker AK, Oldfield CJ. Back to the Future: Nuclear Magnetic Resonance and Bioinformatics Studies on Intrinsically Disordered Proteins. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2015;870:1-34. [PMID: 26387098 DOI: 10.1007/978-3-319-20164-1_1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Mizianty MJ, Fan X, Yan J, Chalmers E, Woloschuk C, Joachimiak A, Kurgan L. Covering complete proteomes with X-ray structures: a current snapshot. ACTA CRYSTALLOGRAPHICA. SECTION D, BIOLOGICAL CRYSTALLOGRAPHY 2014;70:2781-93. [PMID: 25372670 PMCID: PMC4220968 DOI: 10.1107/s1399004714019427] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 08/27/2014] [Indexed: 12/23/2022]

Intrinsically disordered proteins undergo and assist folding transitions in the proteome. Arch Biochem Biophys 2012;531:80-9. [PMID: 23142500 DOI: 10.1016/j.abb.2012.09.010] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Revised: 09/17/2012] [Accepted: 09/20/2012] [Indexed: 11/20/2022]

Midic U, Obradovic Z. Intrinsic disorder in putative protein sequences. Proteome Sci 2012;10 Suppl 1:S19. [PMID: 22759577 PMCID: PMC3380756 DOI: 10.1186/1477-5956-10-s1-s19] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open

Das RK, Mao AH, Pappu RV. Unmasking Functional Motifs Within Disordered Regions of Proteins. Sci Signal 2012;5:pe17. [DOI: 10.1126/scisignal.2003091] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]

Montelione GT. The Protein Structure Initiative: achievements and visions for the future. F1000 BIOLOGY REPORTS 2012;4:7. [PMID: 22500193 PMCID: PMC3318194 DOI: 10.3410/b4-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]

Kumar S. Homology modeling and consensus protein disorder prediction of human filamin. Bioinformation 2011;6:366-9. [PMID: 21904422 PMCID: PMC3163912 DOI: 10.6026/97320630006366] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2011] [Accepted: 07/14/2011] [Indexed: 11/23/2022] Open

Protein disorder--a breakthrough invention of evolution? Curr Opin Struct Biol 2011;21:412-8. [PMID: 21514145 DOI: 10.1016/j.sbi.2011.03.014] [Citation(s) in RCA: 112] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2011] [Revised: 03/29/2011] [Accepted: 03/29/2011] [Indexed: 11/21/2022]

Adkins NL, Georgel PT. MeCP2: structure and functionThis paper is one of a selection of papers published in a Special Issue entitled 31st Annual International Asilomar Chromatin and Chromosomes Conference, and has undergone the Journal’s usual peer review process. Biochem Cell Biol 2011;89:1-11. [DOI: 10.1139/o10-112] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open

Graebsch A, Roche S, Kostrewa D, Söding J, Niessing D. Of bits and bugs--on the use of bioinformatics and a bacterial crystal structure to solve a eukaryotic repeat-protein structure. PLoS One 2010;5:e13402. [PMID: 20976240 PMCID: PMC2954813 DOI: 10.1371/journal.pone.0013402] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2010] [Accepted: 09/24/2010] [Indexed: 11/19/2022] Open

Abstract

Pur-α is a nucleic acid-binding protein involved in cell cycle control, transcription, and neuronal function. Initially no prediction of the three-dimensional structure of Pur-α was possible. However, recently we solved the X-ray structure of Pur-α from the fruitfly Drosophila melanogaster and showed that it contains a so-called PUR domain. Here we explain how we exploited bioinformatics tools in combination with X-ray structure determination of a bacterial homolog to obtain diffracting crystals and the high-resolution structure of Drosophila Pur-α. First, we used sensitive methods for remote-homology detection to find three repetitive regions in Pur-α. We realized that our lack of understanding how these repeats interact to form a globular domain was a major problem for crystallization and structure determination. With our information on the repeat motifs we then identified a distant bacterial homolog that contains only one repeat. We determined the bacterial crystal structure and found that two of the repeats interact to form a globular domain. Based on this bacterial structure, we calculated a computational model of the eukaryotic protein. The model allowed us to design a crystallizable fragment and to determine the structure of Drosophila Pur-α. Key for success was the fact that single repeats of the bacterial protein self-assembled into a globular domain, instructing us on the number and boundaries of repeats to be included for crystallization trials with the eukaryotic protein. This study demonstrates that the simpler structural domain arrangement of a distant prokaryotic protein can guide the design of eukaryotic crystallization constructs. Since many eukaryotic proteins contain multiple repeats or repeating domains, this approach might be instructive for structural studies of a range of proteins.

Collapse

Babnigg G, Joachimiak A. Predicting protein crystallization propensity from protein sequence. ACTA ACUST UNITED AC 2010;11:71-80. [PMID: 20177794 DOI: 10.1007/s10969-010-9080-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 02/05/2010] [Indexed: 10/19/2022]

Busche AEL, Aranko AS, Talebzadeh-Farooji M, Bernhard F, Dötsch V, Iwaï H. Segmental isotopic labeling of a central domain in a multidomain protein by protein trans-splicing using only one robust DnaE intein. Angew Chem Int Ed Engl 2009;48:6128-31. [PMID: 19591176 DOI: 10.1002/anie.200901488] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Busche A, Aranko A, Talebzadeh-Farooji M, Bernhard F, Dötsch V, Iwaï H. Segmental Isotopic Labeling of a Central Domain in a Multidomain Protein by ProteinTrans-Splicing Using Only One Robust DnaE Intein. Angew Chem Int Ed Engl 2009. [DOI: 10.1002/ange.200901488] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Midic U, Oldfield CJ, Dunker AK, Obradovic Z, Uversky VN. Protein disorder in the human diseasome: unfoldomics of human genetic diseases. BMC Genomics 2009;10 Suppl 1:S12. [PMID: 19594871 PMCID: PMC2709255 DOI: 10.1186/1471-2164-10-s1-s12] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open

Price WN, Chen Y, Handelman SK, Neely H, Manor P, Karlin R, Nair R, Liu J, Baran M, Everett J, Tong SN, Forouhar F, Swaminathan SS, Acton T, Xiao R, Luft JR, Lauricella A, DeTitta GT, Rost B, Montelione GT, Hunt JF. Understanding the physical properties that control protein crystallization by analysis of large-scale experimental data. Nat Biotechnol 2009;27:51-7. [PMID: 19079241 DOI: 10.1038/nbt.1514] [Citation(s) in RCA: 107] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Xue B, Oldfield CJ, Dunker AK, Uversky VN. CDF it all: consensus prediction of intrinsically disordered proteins based on various cumulative distribution functions. FEBS Lett 2009;583:1469-74. [PMID: 19351533 DOI: 10.1016/j.febslet.2009.03.070] [Citation(s) in RCA: 113] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2009] [Revised: 03/18/2009] [Accepted: 03/27/2009] [Indexed: 11/29/2022]

Markley JL, Aceti DJ, Bingman CA, Fox BG, Frederick RO, Makino SI, Nichols KW, Phillips GN, Primm JG, Sahu SC, Vojtik FC, Volkman BF, Wrobel RL, Zolnai Z. The Center for Eukaryotic Structural Genomics. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2009;10:165-79. [PMID: 19130299 PMCID: PMC2705709 DOI: 10.1007/s10969-008-9057-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/18/2008] [Accepted: 12/12/2008] [Indexed: 10/29/2022]

Han P, Zhang X, Norton RS, Feng ZP. Large-scale prediction of long disordered regions in proteins using random forests. BMC Bioinformatics 2009;10:8. [PMID: 19128505 PMCID: PMC2637845 DOI: 10.1186/1471-2105-10-8] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2008] [Accepted: 01/07/2009] [Indexed: 12/02/2022] Open

Abstract

Background

Many proteins contain disordered regions that lack fixed three-dimensional (3D) structure under physiological conditions but have important biological functions. Prediction of disordered regions in protein sequences is important for understanding protein function and in high-throughput determination of protein structures. Machine learning techniques, including neural networks and support vector machines have been widely used in such predictions. Predictors designed for long disordered regions are usually less successful in predicting short disordered regions. Combining prediction of short and long disordered regions will dramatically increase the complexity of the prediction algorithm and make the predictor unsuitable for large-scale applications. Efficient batch prediction of long disordered regions alone is of greater interest in large-scale proteome studies.

Results

A new algorithm, IUPforest-L, for predicting long disordered regions using the random forest learning model is proposed in this paper. IUPforest-L is based on the Moreau-Broto auto-correlation function of amino acid indices (AAIs) and other physicochemical features of the primary sequences. In 10-fold cross validation tests, IUPforest-L can achieve an area of 89.5% under the receiver operating characteristic (ROC) curve. Compared with existing disorder predictors, IUPforest-L has high prediction accuracy and is efficient for predicting long disordered regions in large-scale proteomes.

Conclusion

The random forest model based on the auto-correlation functions of the AAIs within a protein fragment and other physicochemical features could effectively detect long disordered regions in proteins. A new predictor, IUPforest-L, was developed to batch predict long disordered regions in proteins, and the server can be accessed from

Collapse

Lieutaud P, Canard B, Longhi S. MeDor: a metaserver for predicting protein disorder. BMC Genomics 2008;9 Suppl 2:S25. [PMID: 18831791 PMCID: PMC2559890 DOI: 10.1186/1471-2164-9-s2-s25] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Haquin S, Oeuillet E, Pajon A, Harris M, Jones AT, van Tilbeurgh H, Markley JL, Zolnai Z, Poupon A. Data management in structural genomics: an overview. Methods Mol Biol 2008;426:49-79. [PMID: 18542857 DOI: 10.1007/978-1-60327-058-8_4] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]

Abstract

Data management has been identified as a crucial issue in all large-scale experimental projects. In this type of project, many different persons manipulate multiple objects in different locations; thus, unless complete and accurate records are maintained, it is extremely difficult to understand exactly what has been done, when it was done, who did it, and what exact protocol was used. All of this information is essential for use in publications, reusing successful protocols, determining why a target has failed, and validating and optimizing protocols. Although data management solutions have been in place for certain focused activities (e.g., genome sequencing and microarray experiments), they are just emerging for more widespread projects, such as structural genomics, metabolomics, and systems biology as a whole. The complexity of experimental procedures, and the diversity and high rate of development of protocols used in a single center, or across various centers, have important consequences for the design of information management systems. Because procedures are carried out by both machines and hand, the system must be capable of handling data entry both from robotic systems and by means of a user-friendly interface. The information management system needs to be flexible so it can handle changes in existing protocols or newly added protocols. Because no commercial information management systems have had the needed features, most structural genomics groups have developed their own solutions. This chapter discusses the advantages of using a LIMS (laboratory information management system), for day-to-day management of structural genomics projects, and also for data mining. This chapter reviews different solutions currently in place or under development with emphasis on three systems developed by the authors: Xtrack, Sesame (developed at the Center for Eukaryotic Structural Genomics under the US Protein Structural Genomics Initiative), and HalX (developed at the Yeast Structural Genomics Laboratory, in collaboration with the European SPINE project).

Collapse

Schlessinger A, Liu J, Rost B. Natively unstructured loops differ from other loops. PLoS Comput Biol 2008;3:e140. [PMID: 17658943 PMCID: PMC1924875 DOI: 10.1371/journal.pcbi.0030140] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2006] [Accepted: 06/05/2007] [Indexed: 11/24/2022] Open

Abstract

Natively unstructured or disordered protein regions may increase the functional complexity of an organism; they are particularly abundant in eukaryotes and often evade structure determination. Many computational methods predict unstructured regions by training on outliers in otherwise well-ordered structures. Here, we introduce an approach that uses a neural network in a very different and novel way. We hypothesize that very long contiguous segments with nonregular secondary structure (NORS regions) differ significantly from regular, well-structured loops, and that a method detecting such features could predict natively unstructured regions. Training our new method, NORSnet, on predicted information rather than on experimental data yielded three major advantages: it removed the overlap between testing and training, it systematically covered entire proteomes, and it explicitly focused on one particular aspect of unstructured regions with a simple structural interpretation, namely that they are loops. Our hypothesis was correct: well-structured and unstructured loops differ so substantially that NORSnet succeeded in their distinction. Benchmarks on previously used and new experimental data of unstructured regions revealed that NORSnet performed very well. Although it was not the best single prediction method, NORSnet was sufficiently accurate to flag unstructured regions in proteins that were previously not annotated. In one application, NORSnet revealed previously undetected unstructured regions in putative targets for structural genomics and may thereby contribute to increasing structural coverage of large eukaryotic families. NORSnet found unstructured regions more often in domain boundaries than expected at random. In another application, we estimated that 50%–70% of all worm proteins observed to have more than seven protein–protein interaction partners have unstructured regions. The comparative analysis between NORSnet and DISOPRED2 suggested that long unstructured loops are a major part of unstructured regions in molecular networks.

The details of protein structures are important for function. Regions that do not adopt any regular structure in isolation (natively unstructured or disordered regions) initially appeared as a curious exception to this structure–function paradigm. It has become increasingly clear that unstructured regions are fundamental to many roles and that they are particularly important for multicellular organisms. Structural biology is just beginning to apprehend the stunning diversity of these roles. Here, we focused on unstructured regions dominated by a particular type of loop, namely the natively unstructured one. We developed a method that succeeded in the distinction between well-structured and natively unstructured loops. For the development, we did not use any experimental data for unstructured regions; when tested on experimental data, the method performed surprisingly well. Due to its different premises, the method captured very different aspects of unstructured regions than other methods that we tested. We applied the new method to two different problems. The first was the identification of proteins that may be difficult targets for structure determination. The second was the identification of worm proteins that have many interaction partners (more than seven) and unstructured regions. Surprisingly, we found unstructured regions of the loopy type in more than 50% of all the promiscuous worm proteins.

Collapse

Bulashevska A, Eils R. Using Bayesian multinomial classifier to predict whether a given protein sequence is intrinsically disordered. J Theor Biol 2008;254:799-803. [PMID: 18611404 DOI: 10.1016/j.jtbi.2008.05.040] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2007] [Revised: 05/19/2008] [Accepted: 05/19/2008] [Indexed: 10/21/2022]

Flexible nets: disorder and induced fit in the associations of p53 and 14-3-3 with their partners. BMC Genomics 2008;9 Suppl 1:S1. [PMID: 18366598 PMCID: PMC2386051 DOI: 10.1186/1471-2164-9-s1-s1] [Citation(s) in RCA: 438] [Impact Index Per Article: 27.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open

Huang YJ, Hang D, Lu LJ, Tong L, Gerstein MB, Montelione GT. Targeting the human cancer pathway protein interaction network by structural genomics. Mol Cell Proteomics 2008;7:2048-60. [PMID: 18487680 DOI: 10.1074/mcp.m700550-mcp200] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open

Abstract

Structural genomics provides an important approach for characterizing and understanding systems biology. As a step toward better integrating protein three-dimensional (3D) structural information in cancer systems biology, we have constructed a Human Cancer Pathway Protein Interaction Network (HCPIN) by analysis of several classical cancer-associated signaling pathways and their physical protein-protein interactions. Many well known cancer-associated proteins play central roles as "hubs" or "bottlenecks" in the HCPIN. At least half of HCPIN proteins are either directly associated with or interact with multiple signaling pathways. Although some 45% of residues in these proteins are in sequence segments that meet criteria sufficient for approximate homology modeling (Basic Local Alignment Search Tool (BLAST) E-value <10(-6)), only approximately 20% of residues in these proteins are structurally covered using high accuracy homology modeling criteria (i.e. BLAST E-value <10(-6) and at least 80% sequence identity) or by actual experimental structures. The HCPIN Website provides a comprehensive description of this biomedically important multipathway network together with experimental and homology models of HCPIN proteins useful for cancer biology research. To complement and enrich cancer systems biology, the Northeast Structural Genomics Consortium is targeting >1000 human proteins and protein domains from the HCPIN for sample production and 3D structure determination. The long range goal of this effort is to provide a comprehensive 3D structure-function database for human cancer-associated proteins and protein complexes in the context of their interaction networks. The network-based target selection (BioNet) approach described here is an example of a general strategy for targeting co-functioning proteins by structural genomics projects.

Collapse

Ishida T, Kinoshita K. Prediction of disordered regions in proteins based on the meta approach. Bioinformatics 2008;24:1344-8. [PMID: 18426805 DOI: 10.1093/bioinformatics/btn195] [Citation(s) in RCA: 212] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Bannen RM, Bingman CA, Phillips GN. Effect of low-complexity regions on protein structure determination. ACTA ACUST UNITED AC 2008;8:217-26. [PMID: 18302007 DOI: 10.1007/s10969-008-9039-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2007] [Accepted: 02/05/2008] [Indexed: 11/24/2022]

Bordoli L, Kiefer F, Schwede T. Assessment of disorder predictions in CASP7. Proteins 2008;69 Suppl 8:129-36. [PMID: 17680688 DOI: 10.1002/prot.21671] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Slabinski L, Jaroszewski L, Rodrigues APC, Rychlewski L, Wilson IA, Lesley SA, Godzik A. The challenge of protein structure determination--lessons from structural genomics. Protein Sci 2008;16:2472-82. [PMID: 17962404 DOI: 10.1110/ps.073037907] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]

Dosztányi Z, Tompa P. Prediction of protein disorder. Methods Mol Biol 2008;426:103-115. [PMID: 18542859 DOI: 10.1007/978-1-60327-058-8_6] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]

Slabinski L, Jaroszewski L, Rychlewski L, Wilson IA, Lesley SA, Godzik A. XtalPred: a web server for prediction of protein crystallizability. ACTA ACUST UNITED AC 2007;23:3403-5. [PMID: 17921170 DOI: 10.1093/bioinformatics/btm477] [Citation(s) in RCA: 218] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Ishida T, Kinoshita K. PrDOS: prediction of disordered protein regions from amino acid sequence. Nucleic Acids Res 2007;35:W460-4. [PMID: 17567614 PMCID: PMC1933209 DOI: 10.1093/nar/gkm363] [Citation(s) in RCA: 609] [Impact Index Per Article: 35.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open

Hirose S, Shimizu K, Kanai S, Kuroda Y, Noguchi T. POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions. Bioinformatics 2007;23:2046-53. [PMID: 17545177 DOI: 10.1093/bioinformatics/btm302] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open

Knappenberger JA, Lecomte JTJ. Loop anchor modification causes the population of an alternative native state in an SH3-like domain. Protein Sci 2007;16:863-79. [PMID: 17456740 PMCID: PMC2206634 DOI: 10.1110/ps.062469507] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]