Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Keul F, Hess M, Goesele M, Hamacher K. PFASUM: a substitution matrix from Pfam structural alignments. BMC Bioinformatics 2017;18:293. [PMID: 28583067 PMCID: PMC5460430 DOI: 10.1186/s12859-017-1703-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Accepted: 05/22/2017] [Indexed: 11/10/2022] Open

For:	Keul F, Hess M, Goesele M, Hamacher K. PFASUM: a substitution matrix from Pfam structural alignments. BMC Bioinformatics 2017;18:293. [PMID: 28583067 PMCID: PMC5460430 DOI: 10.1186/s12859-017-1703-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Accepted: 05/22/2017] [Indexed: 11/10/2022] Open

Number

Cited by Other Article(s)

Parkinson J, Hard R, Ko YS, Wang W. RESP2: An uncertainty aware multi-target multi-property optimization AI pipeline for antibody discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2024.07.30.605700. [PMID: 39131296 PMCID: PMC11312550 DOI: 10.1101/2024.07.30.605700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 08/13/2024]

Johnson SR, Fu X, Viknander S, Goldin C, Monaco S, Zelezniak A, Yang KK. Computational scoring and experimental evaluation of enzymes generated by neural networks. Nat Biotechnol 2025;43:396-405. [PMID: 38653796 PMCID: PMC11919684 DOI: 10.1038/s41587-024-02214-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 03/20/2024] [Indexed: 04/25/2024]

Wright ES. Tandem Repeats Provide Evidence for Convergent Evolution to Similar Protein Structures. Genome Biol Evol 2025;17:evaf013. [PMID: 39852593 PMCID: PMC11812678 DOI: 10.1093/gbe/evaf013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2024] [Accepted: 01/17/2025] [Indexed: 01/26/2025] Open

Abstract

Homology is a key concept underpinning the comparison of sequences across organisms. Sequence-level homology is based on a statistical framework optimized over decades of work. Recently, computational protein structure prediction has enabled large-scale homology inference beyond the limits of accurate sequence alignment. In this regime, it is possible to observe nearly identical protein structures lacking detectable sequence similarity. In the absence of a robust statistical framework for structure comparison, it is largely assumed similar structures are homologous. However, it is conceivable that matching structures could arise through convergent evolution, resulting in analogous proteins without shared ancestry. Large databases of predicted structures offer a means of determining whether analogs are present among structure matches. Here, I find that a small subset (∼2.6%) of Foldseek clusters lack sequence-level support for homology, including ∼1% of strong structure matches with template modeling score ≥ 0.5. This result by itself does not imply these structure pairs are nonhomologous, since their sequences could have diverged beyond the limits of recognition. Yet, strong matches without sequence-level support for homology are enriched in structures with predicted repeats that could induce spurious matches. Some of these structural repeats are underpinned by sequence-level tandem repeats in both matching structures. I show that many of these tandem repeat units have genealogies inconsistent with their corresponding structures sharing a common ancestor, implying these highly similar structure pairs are analogous rather than homologous. This result suggests caution is warranted when inferring homology from structural resemblance alone in the absence of sequence-level support for homology.

Collapse

Lu YY, Noble WS, Keich U. A BLAST from the past: revisiting blastp's E-value. Bioinformatics 2024;40:btae729. [PMID: 39656790 PMCID: PMC11652269 DOI: 10.1093/bioinformatics/btae729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 10/25/2024] [Accepted: 12/02/2024] [Indexed: 12/17/2024] Open

Postovskaya A, Vercauteren K, Meysman P, Laukens K. tcrBLOSUM: an amino acid substitution matrix for sensitive alignment of distant epitope-specific TCRs. Brief Bioinform 2024;26:bbae602. [PMID: 39576224 PMCID: PMC11583439 DOI: 10.1093/bib/bbae602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2024] [Revised: 10/07/2024] [Accepted: 11/05/2024] [Indexed: 11/24/2024] Open

Chow CFW, Ghosh S, Hadarovich A, Toth-Petroczy A. SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences. Proc Natl Acad Sci U S A 2024;121:e2401622121. [PMID: 39383002 PMCID: PMC11494347 DOI: 10.1073/pnas.2401622121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2024] [Accepted: 08/30/2024] [Indexed: 10/11/2024] Open

Wright E. Accurately clustering biological sequences in linear time by relatedness sorting. Nat Commun 2024;15:3047. [PMID: 38589369 PMCID: PMC11001989 DOI: 10.1038/s41467-024-47371-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 03/28/2024] [Indexed: 04/10/2024] Open

Jia K, Kilinc M, Jernigan RL. New alignment method for remote protein sequences by the direct use of pairwise sequence correlations and substitutions. FRONTIERS IN BIOINFORMATICS 2023;3:1227193. [PMID: 37900964 PMCID: PMC10602800 DOI: 10.3389/fbinf.2023.1227193] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 08/14/2023] [Indexed: 10/31/2023] Open

Caswell B, Summers TJ, Licup GL, Cantu DC. Mutation Space of Spatially Conserved Amino Acid Sites in Proteins. ACS OMEGA 2023;8:24302-24310. [PMID: 37457482 PMCID: PMC10339398 DOI: 10.1021/acsomega.3c01473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Accepted: 06/14/2023] [Indexed: 07/18/2023]

Llinares-López F, Berthet Q, Blondel M, Teboul O, Vert JP. Deep embedding and alignment of protein sequences. Nat Methods 2023;20:104-111. [PMID: 36522501 DOI: 10.1038/s41592-022-01700-2] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 10/24/2022] [Indexed: 12/23/2022]

Sumanaweera D, Allison L, Konagurthu AS. Bridging the gaps in statistical models of protein alignment. Bioinformatics 2022;38:i229-i237. [PMID: 35758809 PMCID: PMC9235498 DOI: 10.1093/bioinformatics/btac246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Wei Q, Zou H, Zhong C, Xu J. RPfam: A refiner towards curated-like multiple sequence alignments of the Pfam protein families. J Bioinform Comput Biol 2022;20:2240002. [DOI: 10.1142/s0219720022400029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Jones DAB, Moolhuijzen PM, Hane JK. Remote homology clustering identifies lowly conserved families of effector proteins in plant-pathogenic fungi. Microb Genom 2021;7. [PMID: 34468307 PMCID: PMC8715435 DOI: 10.1099/mgen.0.000637] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open

Trivedi R, Nagarajaram HA. Substitution scoring matrices for proteins - An overview. Protein Sci 2020;29:2150-2163. [PMID: 32954566 DOI: 10.1002/pro.3954] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2020] [Revised: 09/17/2020] [Accepted: 09/18/2020] [Indexed: 01/17/2023]

Polyanovsky V, Lifanov A, Esipova N, Tumanyan V. The ranging of amino acids substitution matrices of various types in accordance with the alignment accuracy criterion. BMC Bioinformatics 2020;21:294. [PMID: 32921315 PMCID: PMC7489204 DOI: 10.1186/s12859-020-03616-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 06/18/2020] [Indexed: 11/15/2022] Open

Abstract

Background

The alignment of character sequences is important in bioinformatics. The quality of this procedure is determined by the substitution matrix and parameters of the insertion-deletion penalty function. These matrices are derived from sequence alignment and thus reflect the evolutionary process. Currently, in addition to evolutionary matrices, a large number of different background matrices have been obtained. To make an optimal choice of the substitution matrix and the penalty parameters, we conducted a numerical experiment using a representative sample of existing matrices of various types and origins.

Results

We tested both the classical evolutionary matrix series (PAM, Blosum, VTML, Pfasum); structural alignment based matrices, contact energy matrix, and matrix based on the properties of the genetic code. This study presents results for two test set types: first, we simulated sequences that reflect the divergent evolution; second, we performed tests on Balibase sequences. In both cases, we obtained the dependences of the alignment quality (Accuracy, Confidence) on the evolutionary distance between sequences and the evolutionary distance to which the substitution matrices correspond. Optimization of a combination of matrices and the penalty parameters was carried out for local and global alignment on the values of penalty function parameters.

Consequently, we found that the best alignment quality is achieved with matrices corresponding to the largest evolutionary distance. These matrices prove to be universal, i.e. suitable for aligning sequences separated by both large and small evolutionary distances. We analysed the correspondence of the correlation coefficients of matrices to the alignment quality. It was found that matrices showing high quality alignment have an above average correlation value, but the converse is not true.

Conclusions

This study showed that the best alignment quality is achieved with evolutionary matrices designed for long distances: Gonnet, VTML250, PAM250, MIQS, and Pfasum050. The same property is inherent in matrices not only of evolutionary origin, but also of another background corresponding to a large evolutionary distance. Therefore, matrices based on structural data show alignment quality close enough to its value for evolutionary matrices. This agrees with the idea that the spatial structure is more conservative than the protein sequence.

Collapse

Asnicar F, Thomas AM, Beghini F, Mengoni C, Manara S, Manghi P, Zhu Q, Bolzan M, Cumbo F, May U, Sanders JG, Zolfo M, Kopylova E, Pasolli E, Knight R, Mirarab S, Huttenhower C, Segata N. Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0. Nat Commun 2020;11:2500. [PMID: 32427907 PMCID: PMC7237447 DOI: 10.1038/s41467-020-16366-7] [Citation(s) in RCA: 448] [Impact Index Per Article: 89.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2019] [Accepted: 04/27/2020] [Indexed: 01/10/2023] Open

Affiliation(s)

Francesco Asnicar Department CIBIO, University of Trento, Trento, Italy
Andrew Maltez Thomas Department CIBIO, University of Trento, Trento, Italy
Francesco Beghini Department CIBIO, University of Trento, Trento, Italy
Claudia Mengoni Department CIBIO, University of Trento, Trento, Italy
Serena Manara Department CIBIO, University of Trento, Trento, Italy
Paolo Manghi Department CIBIO, University of Trento, Trento, Italy
Qiyun Zhu Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
Mattia Bolzan Department CIBIO, University of Trento, Trento, Italy PreBiomics s.r.l, Trento, Italy
Fabio Cumbo Department CIBIO, University of Trento, Trento, Italy
Uyen May Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
Jon G Sanders Department of Pediatrics, University of California San Diego, La Jolla, CA, USA Cornell Institute for Host-Microbe Interaction and Disease, Cornell University, Ithaca, NY, USA
Moreno Zolfo Department CIBIO, University of Trento, Trento, Italy
Evguenia Kopylova Department of Pediatrics, University of California San Diego, La Jolla, CA, USA Clarity Genomics BVBA, Sint-Michielskaai 34, 2000, Antwerpen, Belgium
Edoardo Pasolli Department CIBIO, University of Trento, Trento, Italy Department of Agricultural Sciences, University of Naples Federico II, Portici, Italy
Rob Knight Department of Pediatrics, University of California San Diego, La Jolla, CA, USA Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
Siavash Mirarab Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
Curtis Huttenhower Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA The Broad Institute of MIT and Harvard, Cambridge, MA, USA
Nicola Segata Department CIBIO, University of Trento, Trento, Italy.

Collapse

Zhu Q, Mai U, Pfeiffer W, Janssen S, Asnicar F, Sanders JG, Belda-Ferre P, Al-Ghalith GA, Kopylova E, McDonald D, Kosciolek T, Yin JB, Huang S, Salam N, Jiao JY, Wu Z, Xu ZZ, Cantrell K, Yang Y, Sayyari E, Rabiee M, Morton JT, Podell S, Knights D, Li WJ, Huttenhower C, Segata N, Smarr L, Mirarab S, Knight R. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea. Nat Commun 2019;10:5477. [PMID: 31792218 PMCID: PMC6889312 DOI: 10.1038/s41467-019-13443-4] [Citation(s) in RCA: 192] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2019] [Accepted: 11/06/2019] [Indexed: 11/10/2022] Open

Affiliation(s)

Qiyun Zhu Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
Uyen Mai Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
Wayne Pfeiffer San Diego Supercomputer Center, University of California San Diego, La Jolla, CA, USA
Stefan Janssen Department of Pediatrics, University of California San Diego, La Jolla, CA, USA Algorithmic Bioinformatics, Department of Biology and Chemistry, Justus Liebig University Gießen, Giessen, Germany
Francesco Asnicar Department CIBIO, University of Trento, Trento, Italy
Jon G Sanders Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
Pedro Belda-Ferre Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
Gabriel A Al-Ghalith Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
Evguenia Kopylova Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
Daniel McDonald Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
Tomasz Kosciolek Department of Pediatrics, University of California San Diego, La Jolla, CA, USA Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
John B Yin Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA Department of Mathematics, University of California San Diego, La Jolla, CA, USA
Shi Huang Department of Pediatrics, University of California San Diego, La Jolla, CA, USA Single-Cell Center, Qingdao Institute of Bioenergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong, China
Nimaichand Salam State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
Jian-Yu Jiao State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
Zijun Wu Department of Pediatrics, University of California San Diego, La Jolla, CA, USA Division of Biological Sciences, University of California San Diego, La Jolla, CA, USA
Zhenjiang Z Xu Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
Kalen Cantrell Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
Yimeng Yang Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
Erfan Sayyari Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, CA, USA
Maryam Rabiee Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
James T Morton Department of Pediatrics, University of California San Diego, La Jolla, CA, USA Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
Sheila Podell Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
Dan Knights Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, USA
Wen-Jun Li State Key Laboratory of Biocontrol and Guangdong Key Laboratory of Plant Resources, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
Curtis Huttenhower Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA The Broad Institute of MIT and Harvard, Cambridge, MA, USA
Nicola Segata Department CIBIO, University of Trento, Trento, Italy
Larry Smarr Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, CA, USA
Siavash Mirarab Malopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
Rob Knight Department of Pediatrics, University of California San Diego, La Jolla, CA, USA. Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA. Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA. Department of Bioengineering, University of California San Diego, La Jolla, CA, USA.

Collapse