Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gouzy J, Eugéne P, Greene EA, Kahn D, Corpet F. XDOM, a graphical tool to analyse domain arrangements in any set of protein sequences. Comput Appl Biosci 1997;13:601-8. [PMID: 9475988 DOI: 10.1093/bioinformatics/13.6.601] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

For:	Gouzy J, Eugéne P, Greene EA, Kahn D, Corpet F. XDOM, a graphical tool to analyse domain arrangements in any set of protein sequences. Comput Appl Biosci 1997;13:601-8. [PMID: 9475988 DOI: 10.1093/bioinformatics/13.6.601] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Number

Cited by Other Article(s)

Abnousi A, Broschat SL, Kalyanaraman A. A Fast Alignment-Free Approach for De Novo Detection of Protein Conserved Regions. PLoS One 2016;11:e0161338. [PMID: 27552220 PMCID: PMC4995020 DOI: 10.1371/journal.pone.0161338] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Accepted: 08/03/2016] [Indexed: 12/05/2022] Open

Abstract

BACKGROUND

Identifying conserved regions in protein sequences is a fundamental operation, occurring in numerous sequence-driven analysis pipelines. It is used as a way to decode domain-rich regions within proteins, to compute protein clusters, to annotate sequence function, and to compute evolutionary relationships among protein sequences. A number of approaches exist for identifying and characterizing protein families based on their domains, and because domains represent conserved portions of a protein sequence, the primary computation involved in protein family characterization is identification of such conserved regions. However, identifying conserved regions from large collections (millions) of protein sequences presents significant challenges.

METHODS

In this paper we present a new, alignment-free method for detecting conserved regions in protein sequences called NADDA (No-Alignment Domain Detection Algorithm). Our method exploits the abundance of exact matching short subsequences (k-mers) to quickly detect conserved regions, and the power of machine learning is used to improve the prediction accuracy of detection. We present a parallel implementation of NADDA using the MapReduce framework and show that our method is highly scalable.

RESULTS

We have compared NADDA with Pfam and InterPro databases. For known domains annotated by Pfam, accuracy is 83%, sensitivity 96%, and specificity 44%. For sequences with new domains not present in the training set an average accuracy of 63% is achieved when compared to Pfam. A boost in results in comparison with InterPro demonstrates the ability of NADDA to capture conserved regions beyond those present in Pfam. We have also compared NADDA with ADDA and MKDOM2, assuming Pfam as ground-truth. On average NADDA shows comparable accuracy, more balanced sensitivity and specificity, and being alignment-free, is significantly faster. Excluding the one-time cost of training, runtimes on a single processor were 49s, 10,566s, and 456s for NADDA, ADDA, and MKDOM2, respectively, for a data set comprised of approximately 2500 sequences.

Collapse

Moore AD, Held A, Terrapon N, Weiner J, Bornberg-Bauer E. DoMosaics: software for domain arrangement visualization and domain-centric analysis of proteins. Bioinformatics 2013;30:282-3. [PMID: 24222210 DOI: 10.1093/bioinformatics/btt640] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open

Piwowar M, Krzysztof P, Piotr P. ExonVisualiser - application for visualization exon units in 2D and 3D protein structures. Bioinformation 2012;8:1280-2. [PMID: 23275735 PMCID: PMC3532015 DOI: 10.6026/97320630081280] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2012] [Accepted: 11/14/2012] [Indexed: 11/23/2022] Open

Bae K, Mallick BK, Elsik CG. Prediction of protein interdomain linker regions by a hidden Markov model. Bioinformatics 2005;21:2264-70. [PMID: 15746283 DOI: 10.1093/bioinformatics/bti363] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Veeramachaneni V, Makałowski W. Visualizing sequence similarity of protein families. Genome Res 2004;14:1160-9. [PMID: 15140831 PMCID: PMC419794 DOI: 10.1101/gr.2079204] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]

Claudel-Renard C, Chevalet C, Faraut T, Kahn D. Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res 2004;31:6633-9. [PMID: 14602924 PMCID: PMC275543 DOI: 10.1093/nar/gkg847] [Citation(s) in RCA: 281] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Mohseni-Zadeh S, Louis A, Brézellec P, Risler JL. PHYTOPROT: a database of clusters of plant proteins. Nucleic Acids Res 2004;32:D351-3. [PMID: 14681432 PMCID: PMC308774 DOI: 10.1093/nar/gkh040] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Holzerlandt R, Orengo C, Kellam P, Albà MM. Identification of new herpesvirus gene homologs in the human genome. Genome Res 2002;12:1739-48. [PMID: 12421761 PMCID: PMC187546 DOI: 10.1101/gr.334302] [Citation(s) in RCA: 77] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]

Louis A, Ollivier E, Aude JC, Risler JL. Massive sequence comparisons as a help in annotating genomic sequences. Genome Res 2001;11:1296-303. [PMID: 11435413 PMCID: PMC311131 DOI: 10.1101/gr.gr-1776r] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Nierman WC, Feldblyum TV, Laub MT, Paulsen IT, Nelson KE, Eisen JA, Heidelberg JF, Alley MR, Ohta N, Maddock JR, Potocka I, Nelson WC, Newton A, Stephens C, Phadke ND, Ely B, DeBoy RT, Dodson RJ, Durkin AS, Gwinn ML, Haft DH, Kolonay JF, Smit J, Craven MB, Khouri H, Shetty J, Berry K, Utterback T, Tran K, Wolf A, Vamathevan J, Ermolaeva M, White O, Salzberg SL, Venter JC, Shapiro L, Fraser CM, Eisen J. Complete genome sequence of Caulobacter crescentus. Proc Natl Acad Sci U S A 2001;98:4136-41. [PMID: 11259647 PMCID: PMC31192 DOI: 10.1073/pnas.061029298] [Citation(s) in RCA: 388] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Albà MM, Lee D, Pearl FM, Shepherd AJ, Martin N, Orengo CA, Kellam P. VIDA: a virus database system for the organization of animal virus genome open reading frames. Nucleic Acids Res 2001;29:133-6. [PMID: 11125070 PMCID: PMC29831 DOI: 10.1093/nar/29.1.133] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2000] [Revised: 10/27/2000] [Accepted: 10/27/2000] [Indexed: 11/13/2022] Open

Albà MM, Das R, Orengo CA, Kellam P. Genomewide function conservation and phylogeny in the Herpesviridae. Genome Res 2001;11:43-54. [PMID: 11156614 PMCID: PMC311046 DOI: 10.1101/gr.149801] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]