Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Littmann M, Bordin N, Heinzinger M, Schütze K, Dallago C, Orengo C, Rost B. Clustering FunFams using sequence embeddings improves EC purity. Bioinformatics 2021;37:3449-3455. [PMID: 33978744 PMCID: PMC8545299 DOI: 10.1093/bioinformatics/btab371] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/02/2021] [Accepted: 05/11/2021] [Indexed: 12/05/2022] Open

For:	Littmann M, Bordin N, Heinzinger M, Schütze K, Dallago C, Orengo C, Rost B. Clustering FunFams using sequence embeddings improves EC purity. Bioinformatics 2021;37:3449-3455. [PMID: 33978744 PMCID: PMC8545299 DOI: 10.1093/bioinformatics/btab371] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/02/2021] [Accepted: 05/11/2021] [Indexed: 12/05/2022] Open

Number

Cited by Other Article(s)

Lin W, Wells J, Wang Z, Orengo C, Martin ACR. Enhancing missense variant pathogenicity prediction with protein language models using VariPred. Sci Rep 2024;14:8136. [PMID: 38584172 PMCID: PMC10999449 DOI: 10.1038/s41598-024-51489-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Accepted: 01/05/2024] [Indexed: 04/09/2024] Open

Urhan A, Cosma BM, Earl AM, Manson AL, Abeel T. SAP: Synteny-aware gene function prediction for bacteria using protein embeddings. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.02.539034. [PMID: 37205418 PMCID: PMC10187222 DOI: 10.1101/2023.05.02.539034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]

Bordin N, Dallago C, Heinzinger M, Kim S, Littmann M, Rauer C, Steinegger M, Rost B, Orengo C. Novel machine learning approaches revolutionize protein knowledge. Trends Biochem Sci 2023;48:345-359. [PMID: 36504138 PMCID: PMC10570143 DOI: 10.1016/j.tibs.2022.11.001] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/24/2022] [Accepted: 11/17/2022] [Indexed: 12/10/2022]

AlphaFold2 reveals commonalities and novelties in protein structure space for 21 model organisms. Commun Biol 2023;6:160. [PMID: 36755055 PMCID: PMC9908985 DOI: 10.1038/s42003-023-04488-9] [Citation(s) in RCA: 41] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 01/16/2023] [Indexed: 02/10/2023] Open

Adeyelu T, Bordin N, Waman VP, Sadlej M, Sillitoe I, Moya-Garcia AA, Orengo CA. KinFams: De-Novo Classification of Protein Kinases Using CATH Functional Units. Biomolecules 2023;13:277. [PMID: 36830646 PMCID: PMC9953599 DOI: 10.3390/biom13020277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2022] [Revised: 01/24/2023] [Accepted: 01/26/2023] [Indexed: 02/05/2023] Open

Olenyi T, Marquet C, Heinzinger M, Kröger B, Nikolova T, Bernhofer M, Sändig P, Schütze K, Littmann M, Mirdita M, Steinegger M, Dallago C, Rost B. LambdaPP: Fast and accessible protein-specific phenotype predictions. Protein Sci 2023;32:e4524. [PMID: 36454227 PMCID: PMC9793974 DOI: 10.1002/pro.4524] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 11/09/2022] [Accepted: 11/21/2022] [Indexed: 12/04/2022]

Affiliation(s)

Tobias Olenyi TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany TUM Graduate SchoolCenter of Doctoral Studies in Informatics and its Applications (CeDoSIA)GarchingGermany
Céline Marquet TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany TUM Graduate SchoolCenter of Doctoral Studies in Informatics and its Applications (CeDoSIA)GarchingGermany
Michael Heinzinger TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany TUM Graduate SchoolCenter of Doctoral Studies in Informatics and its Applications (CeDoSIA)GarchingGermany
Benjamin Kröger TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
Tiha Nikolova TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
Michael Bernhofer TUM Graduate SchoolCenter of Doctoral Studies in Informatics and its Applications (CeDoSIA)GarchingGermany
Philip Sändig TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
Konstantin Schütze TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
Maria Littmann TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
Milot Mirdita School of Biological SciencesSeoul National UniversitySeoulSouth Korea
Martin Steinegger School of Biological SciencesSeoul National UniversitySeoulSouth Korea Korea Artificial Intelligence InstituteSeoul National UniversitySeoulSouth Korea Korea Institute of Molecular Biology and GeneticsSeoul National UniversitySeoulSouth Korea
Christian Dallago TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany VantAINew YorkUSA
Burkhard Rost TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany Institute for Advanced Study (TUM‐IAS)Lichtenbergstr. 2a, 85748 Garching/Munich, Germany & TUM School of Life Sciences Weihenstephan (WZW)FreisingGermany

Collapse

Kennedy EN, Foster CA, Barr SA, Bourret RB. General strategies for using amino acid sequence data to guide biochemical investigation of protein function. Biochem Soc Trans 2022;50:1847-1858. [PMID: 36416676 PMCID: PMC10257402 DOI: 10.1042/bst20220849] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 11/04/2022] [Accepted: 11/09/2022] [Indexed: 11/24/2022]

Abstract

The rapid increase of '-omics' data warrants the reconsideration of experimental strategies to investigate general protein function. Studying individual members of a protein family is likely insufficient to provide a complete mechanistic understanding of family functions, especially for diverse families with thousands of known members. Strategies that exploit large amounts of available amino acid sequence data can inspire and guide biochemical experiments, generating broadly applicable insights into a given family. Here we review several methods that utilize abundant sequence data to focus experimental efforts and identify features truly representative of a protein family or domain. First, coevolutionary relationships between residues within primary sequences can be successfully exploited to identify structurally and/or functionally important positions for experimental investigation. Second, functionally important variable residue positions typically occupy a limited sequence space, a property useful for guiding biochemical characterization of the effects of the most physiologically and evolutionarily relevant amino acids. Third, amino acid sequence variation within domains shared between different protein families can be used to sort a particular domain into multiple subtypes, inspiring further experimental designs. Although generally applicable to any kind of protein domain because they depend solely on amino acid sequences, the second and third approaches are reviewed in detail because they appear to have been used infrequently and offer immediate opportunities for new advances. Finally, we speculate that future technologies capable of analyzing and manipulating conserved and variable aspects of the three-dimensional structures of a protein family could lead to broad insights not attainable by current methods.

Collapse

Schütze K, Heinzinger M, Steinegger M, Rost B. Nearest neighbor search on embeddings rapidly identifies distant protein relations. FRONTIERS IN BIOINFORMATICS 2022;2:1033775. [PMID: 36466147 PMCID: PMC9714024 DOI: 10.3389/fbinf.2022.1033775] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 10/31/2022] [Indexed: 11/29/2023] Open

Ilzhöfer D, Heinzinger M, Rost B. SETH predicts nuances of residue disorder from protein embeddings. FRONTIERS IN BIOINFORMATICS 2022;2:1019597. [PMID: 36304335 PMCID: PMC9580958 DOI: 10.3389/fbinf.2022.1019597] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 09/20/2022] [Indexed: 11/07/2022] Open

Abstract

Predictions for millions of protein three-dimensional structures are only a few clicks away since the release of AlphaFold2 results for UniProt. However, many proteins have so-called intrinsically disordered regions (IDRs) that do not adopt unique structures in isolation. These IDRs are associated with several diseases, including Alzheimer's Disease. We showed that three recent disorder measures of AlphaFold2 predictions (pLDDT, "experimentally resolved" prediction and "relative solvent accessibility") correlated to some extent with IDRs. However, expert methods predict IDRs more reliably by combining complex machine learning models with expert-crafted input features and evolutionary information from multiple sequence alignments (MSAs). MSAs are not always available, especially for IDRs, and are computationally expensive to generate, limiting the scalability of the associated tools. Here, we present the novel method SETH that predicts residue disorder from embeddings generated by the protein Language Model ProtT5, which explicitly only uses single sequences as input. Thereby, our method, relying on a relatively shallow convolutional neural network, outperformed much more complex solutions while being much faster, allowing to create predictions for the human proteome in about 1 hour on a consumer-grade PC with one NVIDIA GeForce RTX 3060. Trained on a continuous disorder scale (CheZOD scores), our method captured subtle variations in disorder, thereby providing important information beyond the binary classification of most methods. High performance paired with speed revealed that SETH's nuanced disorder predictions for entire proteomes capture aspects of the evolution of organisms. Additionally, SETH could also be used to filter out regions or proteins with probable low-quality AlphaFold2 3D structures to prioritize running the compute-intensive predictions for large data sets. SETH is freely publicly available at: https://github.com/Rostlab/SETH.

Collapse

Marquet C, Heinzinger M, Olenyi T, Dallago C, Erckert K, Bernhofer M, Nechaev D, Rost B. Embeddings from protein language models predict conservation and variant effects. Hum Genet 2022;141:1629-1647. [PMID: 34967936 PMCID: PMC8716573 DOI: 10.1007/s00439-021-02411-y] [Citation(s) in RCA: 63] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 12/06/2021] [Indexed: 12/13/2022]

Abstract

The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (pLMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or masked amino acids from the context of entire sequence regions. Here, we used pLM representations (embeddings) to predict sequence conservation and SAV effects without multiple sequence alignments (MSAs). Embeddings alone predicted residue conservation almost as accurately from single sequences as ConSeq using MSAs (two-state Matthews Correlation Coefficient-MCC-for ProtT5 embeddings of 0.596 ± 0.006 vs. 0.608 ± 0.006 for ConSeq). Inputting the conservation prediction along with BLOSUM62 substitution scores and pLM mask reconstruction probabilities into a simplistic logistic regression (LR) ensemble for Variant Effect Score Prediction without Alignments (VESPA) predicted SAV effect magnitude without any optimization on DMS data. Comparing predictions for a standard set of 39 DMS experiments to other methods (incl. ESM-1v, DeepSequence, and GEMME) revealed our approach as competitive with the state-of-the-art (SOTA) methods using MSA input. No method outperformed all others, neither consistently nor statistically significantly, independently of the performance measure applied (Spearman and Pearson correlation). Finally, we investigated binary effect predictions on DMS experiments for four human proteins. Overall, embedding-based methods have become competitive with methods relying on MSAs for SAV effect prediction at a fraction of the costs in computing/energy. Our method predicted SAV effects for the entire human proteome (~ 20 k proteins) within 40 min on one Nvidia Quadro RTX 8000. All methods and data sets are freely available for local and online execution through bioembeddings.com, https://github.com/Rostlab/VESPA , and PredictProtein.

Collapse

Affiliation(s)

Céline Marquet Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany. TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
Michael Heinzinger Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Tobias Olenyi Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Christian Dallago Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Kyra Erckert Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Michael Bernhofer Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Dmitrii Nechaev Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Burkhard Rost Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, Garching, 85748, Munich, Germany TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany

Collapse

Organizing the bacterial annotation space with amino acid sequence embeddings. BMC Bioinformatics 2022;23:385. [PMID: 36151519 PMCID: PMC9502642 DOI: 10.1186/s12859-022-04930-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Accepted: 08/11/2022] [Indexed: 11/10/2022] Open

de Crécy-lagard V, Amorin de Hegedus R, Arighi C, Babor J, Bateman A, Blaby I, Blaby-Haas C, Bridge AJ, Burley SK, Cleveland S, Colwell LJ, Conesa A, Dallago C, Danchin A, de Waard A, Deutschbauer A, Dias R, Ding Y, Fang G, Friedberg I, Gerlt J, Goldford J, Gorelik M, Gyori BM, Henry C, Hutinet G, Jaroch M, Karp PD, Kondratova L, Lu Z, Marchler-Bauer A, Martin MJ, McWhite C, Moghe GD, Monaghan P, Morgat A, Mungall CJ, Natale DA, Nelson WC, O’Donoghue S, Orengo C, O’Toole KH, Radivojac P, Reed C, Roberts RJ, Rodionov D, Rodionova IA, Rudolf JD, Saleh L, Sheynkman G, Thibaud-Nissen F, Thomas PD, Uetz P, Vallenet D, Carter EW, Weigele PR, Wood V, Wood-Charlson EM, Xu J. A roadmap for the functional annotation of protein families: a community perspective. Database (Oxford) 2022;2022:baac062. [PMID: 35961013 PMCID: PMC9374478 DOI: 10.1093/database/baac062] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 06/28/2022] [Accepted: 08/03/2022] [Indexed: 12/23/2022]

Affiliation(s)

Valérie de Crécy-lagard Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
Rocio Amorin de Hegedus Genetics Institute, University of Florida, Gainesville, FL 32611, USA
Cecilia Arighi Department of Computer and Information Sciences, University of Delaware, Newark, DE 19713, USA
Jill Babor Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
Alex Bateman European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
Ian Blaby US Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Crysten Blaby-Haas Biology Department, Brookhaven National Laboratory, Upton, NY 11973, USA
Alan J Bridge Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva 4 CH-1211, Switzerland
Stephen K Burley RCSB Protein Data Bank, Institute for Quantitative Biomedicine, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, USA
Stacey Cleveland Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
Lucy J Colwell Departmenf of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
Ana Conesa Spanish National Research Council, Institute for Integrative Systems Biology, Paterna, Valencia 46980, Spain
Christian Dallago TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, i12, Boltzmannstr. 3, Garching/Munich 85748, Germany
Antoine Danchin School of Biomedical Sciences, Li KaShing Faculty of Medicine, The University of Hong Kong, 21 Sassoon Road, Pokfulam, SAR Hong Kong 999077, China
Anita de Waard Research Collaboration Unit, Elsevier, Jericho, VT 05465, USA
Adam Deutschbauer Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Raquel Dias Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
Yousong Ding Department of Medicinal Chemistry, Center for Natural Products, Drug Discovery and Development, University of Florida, Gainesville, FL 32610, USA
Gang Fang NYU-Shanghai, Shanghai 200120, China
Iddo Friedberg Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA 50011, USA
John Gerlt Institute for Genomic Biology and Departments of Biochemistry and Chemistry, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
Joshua Goldford Physics of Living Systems, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Mark Gorelik Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
Benjamin M Gyori Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
Christopher Henry Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA
Geoffrey Hutinet Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
Marshall Jaroch Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
Peter D Karp Bioinformatics Research Group, SRI International, Menlo Park, CA 94025, USA
Liudmyla Kondratova Genetics Institute, University of Florida, Gainesville, FL 32611, USA
Zhiyong Lu National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
Aron Marchler-Bauer National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
Maria-Jesus Martin European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton CB10 1SD, UK
Claire McWhite Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08540, USA
Gaurav D Moghe Plant Biology Section, School of Integrative Plant Science, Cornell University, Ithaca, NY 14853, USA
Paul Monaghan Department of Agricultural Education and Communication, University of Florida, Gainesville, FL 32611, USA
Anne Morgat Swiss-Prot group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva 4 CH-1211, Switzerland
Christopher J Mungall Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Darren A Natale Georgetown University Medical Center, Washington, DC 20007, USA
William C Nelson Biological Sciences Division, Pacific Northwest National Laboratories, Richland, WA 99354, USA
Seán O’Donoghue School of Biotechnology and Biomolecular Sciences, University of NSW, Sydney, NSW 2052, Australia
Christine Orengo Department of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
Katherine H O’Toole New England Biolabs, Ipswich, MA 01938, USA
Predrag Radivojac Khoury College of Computer Sciences, Northeastern University, Boston, MA 02115, USA
Colbie Reed Department of Microbiology and Cell Sciences, University of Florida, Gainesville, FL 32611, USA
Richard J Roberts New England Biolabs, Ipswich, MA 01938, USA
Dmitri Rodionov Sanford Burnham Prebys Medical Discovery Institute, La Jolla, CA 92037, USA
Irina A Rodionova Department of Bioengineering, Division of Engineering, University of California at San Diego, La Jolla, CA 92093-0412, USA
Jeffrey D Rudolf Department of Chemistry, University of Florida, Gainesville, FL 32611, USA
Lana Saleh New England Biolabs, Ipswich, MA 01938, USA
Gloria Sheynkman Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA, USA
Francoise Thibaud-Nissen National Center for Biotechnology Information (NCBI), National Library of Medicine (NLM), National Institutes of Health (NIH), 8600 Rockville Pike, Bethesda, MD 20817, USA
Paul D Thomas Department of Population and Public Health Sciences, University of Southern California, Los Angeles, CA 90033, USA
Peter Uetz Center for Biological Data Science, Virginia Commonwealth University, Richmond, VA 23284, USA
David Vallenet LABGeM, Génomique Métabolique, CEA, Genoscope, Institut François Jacob, Université d’Évry, Université Paris-Saclay, CNRS, Evry 91057, France
Erica Watson Carter Department of Plant Pathology, University of Florida Citrus Research and Education Center, 700 Experiment Station Rd., Lake Alfred, FL 33850, USA
Peter R Weigele New England Biolabs, Ipswich, MA 01938, USA
Valerie Wood Department of Biochemistry, University of Cambridge, Cambridge CB2 1GA, UK
Elisha M Wood-Charlson Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Jin Xu Department of Plant Pathology, University of Florida Citrus Research and Education Center, 700 Experiment Station Rd., Lake Alfred, FL 33850, USA

Collapse

Heinzinger M, Littmann M, Sillitoe I, Bordin N, Orengo C, Rost B. Contrastive learning on protein embeddings enlightens midnight zone. NAR Genom Bioinform 2022;4:lqac043. [PMID: 35702380 PMCID: PMC9188115 DOI: 10.1093/nargab/lqac043] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 03/25/2022] [Accepted: 05/17/2022] [Indexed: 12/23/2022] Open

Weissenow K, Heinzinger M, Rost B. Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction. Structure 2022;30:1169-1177.e4. [DOI: 10.1016/j.str.2022.05.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 02/25/2022] [Accepted: 04/29/2022] [Indexed: 01/27/2023]