Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Forslund K, Sonnhammer ELL. Benchmarking homology detection procedures with low complexity filters. ACTA ACUST UNITED AC 2009;25:2500-5. [PMID: 19620098 DOI: 10.1093/bioinformatics/btp446] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

For:	Forslund K, Sonnhammer ELL. Benchmarking homology detection procedures with low complexity filters. ACTA ACUST UNITED AC 2009;25:2500-5. [PMID: 19620098 DOI: 10.1093/bioinformatics/btp446] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]

Number

Cited by Other Article(s)

Torres AG, Rodríguez-Escribà M, Marcet-Houben M, Santos Vieira H, Camacho N, Catena H, Murillo Recio M, Rafels-Ybern À, Reina O, Torres F, Pardo-Saganta A, Gabaldón T, Novoa E, Ribas de Pouplana L. Human tRNAs with inosine 34 are essential to efficiently translate eukarya-specific low-complexity proteins. Nucleic Acids Res 2021;49:7011-7034. [PMID: 34125917 PMCID: PMC8266599 DOI: 10.1093/nar/gkab461] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Revised: 05/07/2021] [Accepted: 05/18/2021] [Indexed: 12/11/2022] Open

Affiliation(s)

Adrian Gabriel Torres Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
Marta Rodríguez-Escribà Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
Marina Marcet-Houben Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain Barcelona Supercomputing Centre (BSC-CNS), Barcelona, Catalonia 08034, Spain
Helaine Graziele Santos Vieira Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08003, Spain
Noelia Camacho Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
Helena Catena Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
Marina Murillo Recio Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
Àlbert Rafels-Ybern Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
Oscar Reina Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
Francisco Miguel Torres Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain
Ana Pardo-Saganta Centre for Applied Medical Research (CIMA Universidad de Navarra), Pamplona 31008, Spain
Toni Gabaldón Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain Barcelona Supercomputing Centre (BSC-CNS), Barcelona, Catalonia 08034, Spain Catalan Institution for Research and Advanced Studies, Barcelona, Catalonia 08010, Spain
Eva Maria Novoa Centre for Genomic Regulation, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08003, Spain University Pompeu Fabra, Barcelona, Catalonia 08003, Spain
Lluís Ribas de Pouplana Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, Catalonia 08028, Spain Catalan Institution for Research and Advanced Studies, Barcelona, Catalonia 08010, Spain

Collapse

Carroll HD, Spouge JL, Gonzalez M. MultiDomainBenchmark: a multi-domain query and subject database suite. BMC Bioinformatics 2019;20:77. [PMID: 30764761 PMCID: PMC6376684 DOI: 10.1186/s12859-019-2660-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 01/28/2019] [Indexed: 11/10/2022] Open

Saripella GV, Sonnhammer ELL, Forslund K. Benchmarking the next generation of homology inference tools. Bioinformatics 2016;32:2636-41. [PMID: 27256311 PMCID: PMC5013910 DOI: 10.1093/bioinformatics/btw305] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 05/05/2016] [Indexed: 12/21/2022] Open

Abstract

Motivation: Over the last decades, vast numbers of sequences were deposited in public databases. Bioinformatics tools allow homology and consequently functional inference for these sequences. New profile-based homology search tools have been introduced, allowing reliable detection of remote homologs, but have not been systematically benchmarked. To provide such a comparison, which can guide bioinformatics workflows, we extend and apply our previously developed benchmark approach to evaluate the ‘next generation’ of profile-based approaches, including CS-BLAST, HHSEARCH and PHMMER, in comparison with the non-profile based search tools NCBI-BLAST, USEARCH, UBLAST and FASTA.

Method: We generated challenging benchmark datasets based on protein domain architectures within either the PFAM + Clan, SCOP/Superfamily or CATH/Gene3D domain definition schemes. From each dataset, homologous and non-homologous protein pairs were aligned using each tool, and standard performance metrics calculated. We further measured congruence of domain architecture assignments in the three domain databases.

Results: CSBLAST and PHMMER had overall highest accuracy. FASTA, UBLAST and USEARCH showed large trade-offs of accuracy for speed optimization.

Conclusion: Profile methods are superior at inferring remote homologs but the difference in accuracy between methods is relatively small. PHMMER and CSBLAST stand out with the highest accuracy, yet still at a reasonable computational cost. Additionally, we show that less than 0.1% of Swiss-Prot protein pairs considered homologous by one database are considered non-homologous by another, implying that these classifications represent equivalent underlying biological phenomena, differing mostly in coverage and granularity.

Availability and Implementation: Benchmark datasets and all scripts are placed at (http://sonnhammer.org/download/Homology_benchmark).

Contact:forslund@embl.de

Supplementary information: Supplementary data are available at Bioinformatics online.

Collapse

Powell S, Forslund K, Szklarczyk D, Trachana K, Roth A, Huerta-Cepas J, Gabaldón T, Rattei T, Creevey C, Kuhn M, Jensen LJ, von Mering C, Bork P. eggNOG v4.0: nested orthology inference across 3686 organisms. Nucleic Acids Res 2013;42:D231-9. [PMID: 24297252 PMCID: PMC3964997 DOI: 10.1093/nar/gkt1253] [Citation(s) in RCA: 464] [Impact Index Per Article: 38.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open

Mistry J, Coggill P, Eberhardt RY, Deiana A, Giansanti A, Finn RD, Bateman A, Punta M. The challenge of increasing Pfam coverage of the human proteome. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2013;2013:bat023. [PMID: 23603847 PMCID: PMC3630804 DOI: 10.1093/database/bat023] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]

Abstract

It is a worthy goal to completely characterize all human proteins in terms of their domains. Here, using the Pfam database, we asked how far we have progressed in this endeavour. Ninety per cent of proteins in the human proteome matched at least one of 5494 manually curated Pfam-A families. In contrast, human residue coverage by Pfam-A families was <45%, with 9418 automatically generated Pfam-B families adding a further 10%. Even after excluding predicted signal peptide regions and short regions (<50 consecutive residues) unlikely to harbour new families, for ∼38% of the human protein residues, there was no information in Pfam about conservation and evolutionary relationship with other protein regions. This uncovered portion of the human proteome was found to be distributed over almost 25 000 distinct protein regions. Comparison with proteins in the UniProtKB database suggested that the human regions that exhibited similarity to thousands of other sequences were often either divergent elements or N- or C-terminal extensions of existing families. Thirty-four per cent of regions, on the other hand, matched fewer than 100 sequences in UniProtKB. Most of these did not appear to share any relationship with existing Pfam-A families, suggesting that thousands of new families would need to be generated to cover them. Also, these latter regions were particularly rich in amino acid compositional bias such as the one associated with intrinsic disorder. This could represent a significant obstacle toward their inclusion into new Pfam families. Based on these observations, a major focus for increasing Pfam coverage of the human proteome will be to improve the definition of existing families. New families will also be built, prioritizing those that have been experimentally functionally characterized.

Database URL: http://pfam.sanger.ac.uk/

Collapse

Schreiber F, Sonnhammer ELL. Hieranoid: hierarchical orthology inference. J Mol Biol 2013;425:2072-2081. [PMID: 23485417 DOI: 10.1016/j.jmb.2013.02.018] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2012] [Revised: 02/13/2013] [Accepted: 02/16/2013] [Indexed: 12/13/2022]

Frith MC. Gentle masking of low-complexity sequences improves homology search. PLoS One 2011;6:e28819. [PMID: 22205972 PMCID: PMC3242753 DOI: 10.1371/journal.pone.0028819] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2011] [Accepted: 11/15/2011] [Indexed: 11/19/2022] Open

Forslund K, Schreiber F, Thanintorn N, Sonnhammer ELL. OrthoDisease: tracking disease gene orthologs across 100 species. Brief Bioinform 2011;12:463-73. [PMID: 21565935 DOI: 10.1093/bib/bbr024] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Protein disorder--a breakthrough invention of evolution? Curr Opin Struct Biol 2011;21:412-8. [PMID: 21514145 DOI: 10.1016/j.sbi.2011.03.014] [Citation(s) in RCA: 112] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2011] [Revised: 03/29/2011] [Accepted: 03/29/2011] [Indexed: 11/21/2022]

Ostlund G, Schmitt T, Forslund K, Köstler T, Messina DN, Roopra S, Frings O, Sonnhammer ELL. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res 2009;38:D196-203. [PMID: 19892828 PMCID: PMC2808972 DOI: 10.1093/nar/gkp931] [Citation(s) in RCA: 469] [Impact Index Per Article: 29.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open