Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

309
(from Reference Citation Analysis)

Article PDFs (93)

Cited by > 0 (279)

Searched Name

Burkhard Rost

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Weissenow K, Rost B. Rendering protein mutation movies with MutAmore. BMC Bioinformatics 2023;24:469. [PMID: 38087198 PMCID: PMC10714560 DOI: 10.1186/s12859-023-05610-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Accepted: 12/08/2023] [Indexed: 12/18/2023] Open

Abakarova M, Marquet C, Rera M, Rost B, Laine E. Alignment-based Protein Mutational Landscape Prediction: Doing More with Less. Genome Biol Evol 2023;15:evad201. [PMID: 37936309 PMCID: PMC10653582 DOI: 10.1093/gbe/evad201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2023] [Revised: 10/27/2023] [Accepted: 11/01/2023] [Indexed: 11/09/2023] Open

Koludarov I, Velasque M, Senoner T, Timm T, Greve C, Hamadou AB, Gupta DK, Lochnit G, Heinzinger M, Vilcinskas A, Gloag R, Harpur BA, Podsiadlowski L, Rost B, Jackson TNW, Dutertre S, Stolle E, von Reumont BM. Prevalent bee venom genes evolved before the aculeate stinger and eusociality. BMC Biol 2023;21:229. [PMID: 37867198 PMCID: PMC10591384 DOI: 10.1186/s12915-023-01656-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 06/29/2023] [Indexed: 10/24/2023] Open

Affiliation(s)

Ivan Koludarov Justus Liebig University of Gießen, Institute for Insect Biotechnology, Heinrich-Buff-Ring 58, 35392, Giessen, Germany. Department of Informatics, Bioinformatics and Computational Biology, i12, Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany.
Mariana Velasque Genomics & Regulatory Systems Unit, Okinawa Institute of Science & Technology, Tancha, Okinawa, 1919, Japan
Tobias Senoner Department of Informatics, Bioinformatics and Computational Biology, i12, Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany
Thomas Timm Protein Analytics, Institute of Biochemistry, Justus Liebig University, Friedrichstrasse 24, 35392, Giessen, Germany
Carola Greve LOEWE Centre for Translational Biodiversity Genomics (TBG), Senckenberganlage 25, 60325, Frankfurt, Germany
Alexander Ben Hamadou LOEWE Centre for Translational Biodiversity Genomics (TBG), Senckenberganlage 25, 60325, Frankfurt, Germany
Deepak Kumar Gupta LOEWE Centre for Translational Biodiversity Genomics (TBG), Senckenberganlage 25, 60325, Frankfurt, Germany
Günter Lochnit Protein Analytics, Institute of Biochemistry, Justus Liebig University, Friedrichstrasse 24, 35392, Giessen, Germany
Michael Heinzinger Department of Informatics, Bioinformatics and Computational Biology, i12, Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany
Andreas Vilcinskas Justus Liebig University of Gießen, Institute for Insect Biotechnology, Heinrich-Buff-Ring 58, 35392, Giessen, Germany Fraunhofer Institute for Molecular Biology and Applied Ecology, Department of Bioresources, Ohlebergsweg 12, 35392, Giessen, Germany
Rosalyn Gloag Rosalyn Gloag - School of Life and Environmental Sciences, The University of Sydney, Sydney, NSW, 2006, Australia
Brock A Harpur Brock A. Harpur - Department of Entomology, Purdue University, 901 W. State Street, West Lafayette, IN, 47907, USA
Lars Podsiadlowski Leibniz Institute for the Analysis of Biodiversity Change, Zoological Research Museum Alexander Koenig, Centre of Molecular Biodiversity Research, Adenauerallee 160, 53113, Bonn, Germany
Burkhard Rost Department of Informatics, Bioinformatics and Computational Biology, i12, Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany
Timothy N W Jackson Australian Venom Research Unit, Department of Biochemistry and Pharmacology, University of Melbourne, Grattan Street, Parkville, Viktoria, 3010, Australia
Sebastien Dutertre IBMM, Université Montpellier, CNRS, ENSCM, 34095, Montpellier, France
Eckart Stolle Leibniz Institute for the Analysis of Biodiversity Change, Zoological Research Museum Alexander Koenig, Centre of Molecular Biodiversity Research, Adenauerallee 160, 53113, Bonn, Germany
Björn M von Reumont LOEWE Centre for Translational Biodiversity Genomics (TBG), Senckenberganlage 25, 60325, Frankfurt, Germany. Faculty of Biological Sciences, Group of Applied Bioinformatics, Goethe University Frankfurt, Max-Von-Laue Str. 13, 60438, Frankfurt, Germany.

Collapse

Llorián-Salvador Ó, Akhgar J, Pigorsch S, Borm K, Münch S, Bernhardt D, Rost B, Andrade-Navarro MA, Combs SE, Peeken JC. The importance of planning CT-based imaging features for machine learning-based prediction of pain response. Sci Rep 2023;13:17427. [PMID: 37833283 PMCID: PMC10576053 DOI: 10.1038/s41598-023-43768-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 09/28/2023] [Indexed: 10/15/2023] Open

Affiliation(s)

Óscar Llorián-Salvador Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany Department for Bioinformatics and Computational Biology, Informatik 12, Technical University of Munich (TUM), Boltzmannstraße 3, 85748, Garching, Germany Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
Joachim Akhgar Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany
Steffi Pigorsch Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany
Kai Borm Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany
Stefan Münch Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany
Denise Bernhardt Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany Department of Radiation Sciences (DRS), Institute of Radiation Medicine (IRM), Helmholtz Zentrum, 85764, München, Germany Deutsches Konsortium für Translationale Krebsforschung (DKTK), Partner Site Munich, 69120, Heidelberg, Germany
Burkhard Rost Department for Bioinformatics and Computational Biology, Informatik 12, Technical University of Munich (TUM), Boltzmannstraße 3, 85748, Garching, Germany
Miguel A Andrade-Navarro Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
Stephanie E Combs Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany Department of Radiation Sciences (DRS), Institute of Radiation Medicine (IRM), Helmholtz Zentrum, 85764, München, Germany Deutsches Konsortium für Translationale Krebsforschung (DKTK), Partner Site Munich, 69120, Heidelberg, Germany
Jan C Peeken Department of Radiation Oncology, Klinikum Rechts der Isar, Technical University of Munich (TUM), Ismaninger Straße 22, 81675, Munich, Germany. Department of Radiation Sciences (DRS), Institute of Radiation Medicine (IRM), Helmholtz Zentrum, 85764, München, Germany. Deutsches Konsortium für Translationale Krebsforschung (DKTK), Partner Site Munich, 69120, Heidelberg, Germany.

Collapse

Koludarov I, Senoner T, Jackson TNW, Dashevsky D, Heinzinger M, Aird SD, Rost B. Domain loss enabled evolution of novel functions in the snake three-finger toxin gene superfamily. Nat Commun 2023;14:4861. [PMID: 37567881 PMCID: PMC10421932 DOI: 10.1038/s41467-023-40550-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2022] [Accepted: 07/28/2023] [Indexed: 08/13/2023] Open

Foreman SC, Llorián-Salvador O, David DE, Rösner VKN, Rischewski JF, Feuerriegel GC, Kramp DW, Luiken I, Lohse AK, Kiefer J, Mogler C, Knebel C, Jung M, Andrade-Navarro MA, Rost B, Combs SE, Makowski MR, Woertler K, Peeken JC, Gersing AS. Development and Evaluation of MR-Based Radiogenomic Models to Differentiate Atypical Lipomatous Tumors from Lipomas. Cancers (Basel) 2023;15:cancers15072150. [PMID: 37046811 PMCID: PMC10093205 DOI: 10.3390/cancers15072150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 03/10/2023] [Accepted: 03/27/2023] [Indexed: 04/08/2023] Open

Affiliation(s)

Sarah C. Foreman Department of Radiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany
Oscar Llorián-Salvador Department of Radiation Oncology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany Department of Informatics, Bioinformatics and Computational Biology—i12, Technische Universität München, Boltzmannstr. 3, 85748 Munich, Germany Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
Diana E. David Department of Informatics, Bioinformatics and Computational Biology—i12, Technische Universität München, Boltzmannstr. 3, 85748 Munich, Germany
Verena K. N. Rösner Department of Radiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany
Jon F. Rischewski Department of Diagnostic and Interventional Neuroradiology, University Hospital Munich (LMU), Marchioninistrasse 15, 81377 Munich, Germany
Georg C. Feuerriegel Department of Radiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany
Daniel W. Kramp Department of Radiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany
Ina Luiken Department of Radiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany
Ann-Kathrin Lohse Department of Radiology, University Hospital Munich (LMU), Marchioninistrasse 15, 81377 Munich, Germany
Jurij Kiefer Department of Plastic Surgery, University Hospital Freiburg, University of Freiburg, Hugstetterstraße 55, 79106 Freiburg im Breisgau, Germany
Carolin Mogler Institute of Pathology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany
Carolin Knebel Department of Orthopedics and Sport Orthopedics, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany
Matthias Jung Department of Radiology, University Hospital Freiburg, University of Freiburg, Hugstetterstraße 55, 79106 Freiburg im Breisgau, Germany
Miguel A. Andrade-Navarro Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, Hanns-Dieter-Hüsch-Weg 15, 55128 Mainz, Germany
Burkhard Rost Department of Informatics, Bioinformatics and Computational Biology—i12, Technische Universität München, Boltzmannstr. 3, 85748 Munich, Germany
Stephanie E. Combs Department of Radiation Oncology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany
Marcus R. Makowski Department of Radiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany
Klaus Woertler Department of Radiology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany
Jan C. Peeken Department of Radiation Oncology, Klinikum Rechts der Isar, Technische Universität München, Ismaninger Straße 22, 81675 Munich, Germany Helmholtz Zentrum München, Deutsches Forschungszentrum für Umwelt und Gesundheit, Institute of Radiation Medicine Neuherberg, 85764 Munich, Germany Deutsches Konsortium für Translationale Krebsforschung (DKTK), Partner Site Munich, 69120 Heidelberg, Germany
Alexandra S. Gersing Department of Diagnostic and Interventional Neuroradiology, University Hospital Munich (LMU), Marchioninistrasse 15, 81377 Munich, Germany

Collapse

Bordin N, Dallago C, Heinzinger M, Kim S, Littmann M, Rauer C, Steinegger M, Rost B, Orengo C. Novel machine learning approaches revolutionize protein knowledge. Trends Biochem Sci 2023;48:345-359. [PMID: 36504138 PMCID: PMC10570143 DOI: 10.1016/j.tibs.2022.11.001] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/24/2022] [Accepted: 11/17/2022] [Indexed: 12/10/2022]

Zatorski N, Sun Y, Elmas A, Dallago C, Karl T, Stein D, Rost B, Huang KL, Walsh M, Schlessinger A. Structural Analysis of Genomic and Proteomic Signatures Reveal Dynamic Expression of Intrinsically Disordered Regions in Breast Cancer and Tissue. bioRxiv 2023:2023.02.23.529755. [PMID: 36865220 PMCID: PMC9980136 DOI: 10.1101/2023.02.23.529755] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/26/2023]

Olenyi T, Marquet C, Heinzinger M, Kröger B, Nikolova T, Bernhofer M, Sändig P, Schütze K, Littmann M, Mirdita M, Steinegger M, Dallago C, Rost B. LambdaPP: Fast and accessible protein-specific phenotype predictions. Protein Sci 2023;32:e4524. [PMID: 36454227 PMCID: PMC9793974 DOI: 10.1002/pro.4524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 11/09/2022] [Accepted: 11/21/2022] [Indexed: 12/04/2022]

Affiliation(s)

Tobias Olenyi TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany,TUM Graduate SchoolCenter of Doctoral Studies in Informatics and its Applications (CeDoSIA)GarchingGermany
Céline Marquet TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany,TUM Graduate SchoolCenter of Doctoral Studies in Informatics and its Applications (CeDoSIA)GarchingGermany
Michael Heinzinger TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany,TUM Graduate SchoolCenter of Doctoral Studies in Informatics and its Applications (CeDoSIA)GarchingGermany
Benjamin Kröger TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
Tiha Nikolova TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
Michael Bernhofer TUM Graduate SchoolCenter of Doctoral Studies in Informatics and its Applications (CeDoSIA)GarchingGermany
Philip Sändig TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
Konstantin Schütze TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
Maria Littmann TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany
Milot Mirdita School of Biological SciencesSeoul National UniversitySeoulSouth Korea
Martin Steinegger School of Biological SciencesSeoul National UniversitySeoulSouth Korea,Korea Artificial Intelligence InstituteSeoul National UniversitySeoulSouth Korea,Korea Institute of Molecular Biology and GeneticsSeoul National UniversitySeoulSouth Korea
Christian Dallago TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany,VantAINew YorkUSA
Burkhard Rost TUM (Technical University of Munich) Department of InformaticsBioinformatics‐ & Computational Biology—i12GarchingGermany,Institute for Advanced Study (TUM‐IAS)Lichtenbergstr. 2a, 85748 Garching/Munich, Germany & TUM School of Life Sciences Weihenstephan (WZW)FreisingGermany

Collapse

Nallapareddy V, Bordin N, Sillitoe I, Heinzinger M, Littmann M, Waman VP, Sen N, Rost B, Orengo C. CATHe: detection of remote homologues for CATH superfamilies using embeddings from protein language models. Bioinformatics 2023;39:6989624. [PMID: 36648327 PMCID: PMC9887088 DOI: 10.1093/bioinformatics/btad029] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 12/07/2022] [Accepted: 01/16/2023] [Indexed: 01/18/2023] Open

Schütze K, Heinzinger M, Steinegger M, Rost B. Nearest neighbor search on embeddings rapidly identifies distant protein relations. Front Bioinform 2022;2:1033775. [PMID: 36466147 PMCID: PMC9714024 DOI: 10.3389/fbinf.2022.1033775] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2022] [Accepted: 10/31/2022] [Indexed: 11/29/2023] Open

Ilzhöfer D, Heinzinger M, Rost B. SETH predicts nuances of residue disorder from protein embeddings. Front Bioinform 2022;2:1019597. [PMID: 36304335 PMCID: PMC9580958 DOI: 10.3389/fbinf.2022.1019597] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2022] [Accepted: 09/20/2022] [Indexed: 11/07/2022] Open

Abstract

Predictions for millions of protein three-dimensional structures are only a few clicks away since the release of AlphaFold2 results for UniProt. However, many proteins have so-called intrinsically disordered regions (IDRs) that do not adopt unique structures in isolation. These IDRs are associated with several diseases, including Alzheimer’s Disease. We showed that three recent disorder measures of AlphaFold2 predictions (pLDDT, “experimentally resolved” prediction and “relative solvent accessibility”) correlated to some extent with IDRs. However, expert methods predict IDRs more reliably by combining complex machine learning models with expert-crafted input features and evolutionary information from multiple sequence alignments (MSAs). MSAs are not always available, especially for IDRs, and are computationally expensive to generate, limiting the scalability of the associated tools. Here, we present the novel method SETH that predicts residue disorder from embeddings generated by the protein Language Model ProtT5, which explicitly only uses single sequences as input. Thereby, our method, relying on a relatively shallow convolutional neural network, outperformed much more complex solutions while being much faster, allowing to create predictions for the human proteome in about 1 hour on a consumer-grade PC with one NVIDIA GeForce RTX 3060. Trained on a continuous disorder scale (CheZOD scores), our method captured subtle variations in disorder, thereby providing important information beyond the binary classification of most methods. High performance paired with speed revealed that SETH’s nuanced disorder predictions for entire proteomes capture aspects of the evolution of organisms. Additionally, SETH could also be used to filter out regions or proteins with probable low-quality AlphaFold2 3D structures to prioritize running the compute-intensive predictions for large data sets. SETH is freely publicly available at: https://github.com/Rostlab/SETH.

Collapse

Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, Gibbs T, Feher T, Angerer C, Steinegger M, Bhowmik D, Rost B. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans Pattern Anal Mach Intell 2022;44:7112-7127. [PMID: 34232869 DOI: 10.1109/tpami.2021.3095381] [Citation(s) in RCA: 270] [Impact Index Per Article: 135.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/10/2023]

Elnaggar A, Heinzinger M, Dallago C, Rehawi G, Wang Y, Jones L, Gibbs T, Feher T, Angerer C, Steinegger M, Bhowmik D, Rost B. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Trans Pattern Anal Mach Intell 2022. [PMID: 34232869 DOI: 10.1101/2020.07.12.199554] [Citation(s) in RCA: 61] [Impact Index Per Article: 30.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]

Marquet C, Heinzinger M, Olenyi T, Dallago C, Erckert K, Bernhofer M, Nechaev D, Rost B. Embeddings from protein language models predict conservation and variant effects. Hum Genet 2022;141:1629-1647. [PMID: 34967936 PMCID: PMC8716573 DOI: 10.1007/s00439-021-02411-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Accepted: 12/06/2021] [Indexed: 12/13/2022]

Abstract

The emergence of SARS-CoV-2 variants stressed the demand for tools allowing to interpret the effect of single amino acid variants (SAVs) on protein function. While Deep Mutational Scanning (DMS) sets continue to expand our understanding of the mutational landscape of single proteins, the results continue to challenge analyses. Protein Language Models (pLMs) use the latest deep learning (DL) algorithms to leverage growing databases of protein sequences. These methods learn to predict missing or masked amino acids from the context of entire sequence regions. Here, we used pLM representations (embeddings) to predict sequence conservation and SAV effects without multiple sequence alignments (MSAs). Embeddings alone predicted residue conservation almost as accurately from single sequences as ConSeq using MSAs (two-state Matthews Correlation Coefficient-MCC-for ProtT5 embeddings of 0.596 ± 0.006 vs. 0.608 ± 0.006 for ConSeq). Inputting the conservation prediction along with BLOSUM62 substitution scores and pLM mask reconstruction probabilities into a simplistic logistic regression (LR) ensemble for Variant Effect Score Prediction without Alignments (VESPA) predicted SAV effect magnitude without any optimization on DMS data. Comparing predictions for a standard set of 39 DMS experiments to other methods (incl. ESM-1v, DeepSequence, and GEMME) revealed our approach as competitive with the state-of-the-art (SOTA) methods using MSA input. No method outperformed all others, neither consistently nor statistically significantly, independently of the performance measure applied (Spearman and Pearson correlation). Finally, we investigated binary effect predictions on DMS experiments for four human proteins. Overall, embedding-based methods have become competitive with methods relying on MSAs for SAV effect prediction at a fraction of the costs in computing/energy. Our method predicted SAV effects for the entire human proteome (~ 20 k proteins) within 40 min on one Nvidia Quadro RTX 8000. All methods and data sets are freely available for local and online execution through bioembeddings.com, https://github.com/Rostlab/VESPA , and PredictProtein.

Collapse

Affiliation(s)

Céline Marquet Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany. TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
Michael Heinzinger Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Tobias Olenyi Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Christian Dallago Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Kyra Erckert Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Michael Bernhofer Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Dmitrii Nechaev Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Burkhard Rost Department of Informatics, Bioinformatics and Computational Biology - i12, TUM-Technical University of Munich, Boltzmannstr. 3, Garching, 85748, Munich, Germany Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, Garching, 85748, Munich, Germany TUM School of Life Sciences Weihenstephan (TUM-WZW), Alte Akademie 8, Freising, Germany

Collapse

Foley G, Mora A, Ross CM, Bottoms S, Sützl L, Lamprecht ML, Zaugg J, Essebier A, Balderson B, Newell R, Thomson RES, Kobe B, Barnard RT, Guddat L, Schenk G, Carsten J, Gumulya Y, Rost B, Haltrich D, Sieber V, Gillam EMJ, Bodén M. Engineering indel and substitution variants of diverse and ancient enzymes using Graphical Representation of Ancestral Sequence Predictions (GRASP). PLoS Comput Biol 2022;18:e1010633. [PMID: 36279274 PMCID: PMC9632902 DOI: 10.1371/journal.pcbi.1010633] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 11/03/2022] [Accepted: 10/04/2022] [Indexed: 11/06/2022] Open

Affiliation(s)

Gabriel Foley School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
Ariane Mora School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
Connie M. Ross School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
Scott Bottoms Campus Straubing for Biotechnology and Sustainability, Technische Universität München, Straubing, Germany
Leander Sützl Institut für Lebensmitteltechnologie, Universität für Bodenkultur Wien, Vienna, Austria
Marnie L. Lamprecht School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
Julian Zaugg School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
Alexandra Essebier School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
Brad Balderson School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
Rhys Newell School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
Raine E. S. Thomson School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
Bostjan Kobe School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia Institute for Molecular Bioscience and Australian Infectious Diseases Research Centre, The University of Queensland, Brisbane, Australia
Ross T. Barnard School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
Luke Guddat School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
Gerhard Schenk School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia Sustainable Minerals Institute, The University of Queensland, Brisbane, Australia
Jörg Carsten Zentralinstitut für Katalyseforschung, Technische Universität München, Munich, Germany
Yosephine Gumulya School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
Burkhard Rost Fakultät für Informatik, Technische Universität München, Munich, Germany
Dietmar Haltrich Institut für Lebensmitteltechnologie, Universität für Bodenkultur Wien, Vienna, Austria
Volker Sieber School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia Campus Straubing for Biotechnology and Sustainability, Technische Universität München, Straubing, Germany Zentralinstitut für Katalyseforschung, Technische Universität München, Munich, Germany
Elizabeth M. J. Gillam School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia * E-mail: (MB); (EMJG)
Mikael Bodén School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia * E-mail: (MB); (EMJG)

Collapse

Bernhofer M, Rost B. TMbed: transmembrane proteins predicted through language model embeddings. BMC Bioinformatics 2022;23:326. [PMID: 35941534 PMCID: PMC9358067 DOI: 10.1186/s12859-022-04873-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 08/03/2022] [Indexed: 12/30/2022] Open

Heinzinger M, Littmann M, Sillitoe I, Bordin N, Orengo C, Rost B. Contrastive learning on protein embeddings enlightens midnight zone. NAR Genom Bioinform 2022;4:lqac043. [PMID: 35702380 PMCID: PMC9188115 DOI: 10.1093/nargab/lqac043] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 03/25/2022] [Accepted: 05/17/2022] [Indexed: 12/23/2022] Open

Weissenow K, Heinzinger M, Rost B. Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction. Structure 2022;30:1169-1177.e4. [DOI: 10.1016/j.str.2022.05.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 02/25/2022] [Accepted: 04/29/2022] [Indexed: 01/27/2023]

Lautenbacher L, Samaras P, Muller J, Grafberger A, Shraideh M, Rank J, Fuchs ST, Schmidt TK, The M, Dallago C, Wittges H, Rost B, Krcmar H, Kuster B, Wilhelm M. ProteomicsDB: toward a FAIR open-source resource for life-science research. Nucleic Acids Res 2022;50:D1541-D1552. [PMID: 34791421 PMCID: PMC8728203 DOI: 10.1093/nar/gkab1026] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Revised: 10/12/2021] [Accepted: 10/15/2021] [Indexed: 12/28/2022] Open

Affiliation(s)

Ludwig Lautenbacher Technical University of Munich, Computational Mass Spectrometry, 85354 Freising, Bavaria, Germany
Patroklos Samaras Technical University of Munich, Chair of Proteomics and Bioanalytics, 85354 Freising, Bavaria, Germany
Julian Muller Technical University of Munich, Chair of Proteomics and Bioanalytics, 85354 Freising, Bavaria, Germany
Andreas Grafberger Technical University of Munich, Chair of Proteomics and Bioanalytics, 85354 Freising, Bavaria, Germany
Marwin Shraideh Technical University of Munich, Chair for Information Systems, 85748 Garching, Bavaria, Germany Technical University of Munich, SAP University Competence Center, 85748 Garching, Bavaria, Germany
Johannes Rank Technical University of Munich, Chair for Information Systems, 85748 Garching, Bavaria, Germany Technical University of Munich, SAP University Competence Center, 85748 Garching, Bavaria, Germany
Simon T Fuchs Technical University of Munich, Chair for Information Systems, 85748 Garching, Bavaria, Germany Technical University of Munich, SAP University Competence Center, 85748 Garching, Bavaria, Germany
Tobias K Schmidt Technical University of Munich, Chair of Proteomics and Bioanalytics, 85354 Freising, Bavaria, Germany
Matthew The Technical University of Munich, Chair of Proteomics and Bioanalytics, 85354 Freising, Bavaria, Germany
Christian Dallago Technical University of Munich, Department for Bioinformatics and Computational Biology, 85748 Garching, Bavaria, Germany Technical University of Munich, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), 85748 Garching, Bavaria, Germany
Holger Wittges Technical University of Munich, Chair for Information Systems, 85748 Garching, Bavaria, Germany Technical University of Munich, SAP University Competence Center, 85748 Garching, Bavaria, Germany
Burkhard Rost Technical University of Munich, Department for Bioinformatics and Computational Biology, 85748 Garching, Bavaria, Germany Technical University of Munich, Institute for Advanced Study (TUM-IAS), 85748 Freising, Bavaria, Germany
Helmut Krcmar Technical University of Munich, Chair for Information Systems, 85748 Garching, Bavaria, Germany Technical University of Munich, SAP University Competence Center, 85748 Garching, Bavaria, Germany
Bernhard Kuster Technical University of Munich, Chair of Proteomics and Bioanalytics, 85354 Freising, Bavaria, Germany Technical University of Munich, Bavarian Biomolecular Mass Spectrometry Center (BayBioMS), 85354 Freising, Bavaria, Germany
Mathias Wilhelm Technical University of Munich, Computational Mass Spectrometry, 85354 Freising, Bavaria, Germany

Collapse

Littmann M, Heinzinger M, Dallago C, Weissenow K, Rost B. Protein embeddings and deep learning predict binding residues for various ligand classes. Sci Rep 2021;11:23916. [PMID: 34903827 PMCID: PMC8668950 DOI: 10.1038/s41598-021-03431-4] [Citation(s) in RCA: 35] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 12/02/2021] [Indexed: 01/27/2023] Open

Stärk H, Dallago C, Heinzinger M, Rost B. Light attention predicts protein location from the language of life. Bioinform Adv 2021;1:vbab035. [PMID: 36700108 PMCID: PMC9710637 DOI: 10.1093/bioadv/vbab035] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Revised: 09/27/2021] [Accepted: 11/15/2021] [Indexed: 01/28/2023]

Heinzinger M, Dallago C, Rost B. Protein matchmaking through representation learning. Cell Syst 2021;12:948-950. [PMID: 34672956 DOI: 10.1016/j.cels.2021.09.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]

O’Donoghue SI, Schafferhans A, Sikta N, Stolte C, Kaur S, Ho BK, Anderson S, Procter JB, Dallago C, Bordin N, Adcock M, Rost B. SARS-CoV-2 structural coverage map reveals viral protein assembly, mimicry, and hijacking mechanisms. Mol Syst Biol 2021;17:e10079. [PMID: 34519429 PMCID: PMC8438690 DOI: 10.15252/msb.202010079] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 08/05/2021] [Accepted: 08/06/2021] [Indexed: 01/18/2023] Open

Dallago C, Goldberg T, Andrade-Navarro MA, Alanis-Lobato G, Rost B. Visualizing Human Protein-Protein Interactions and Subcellular Localizations on Cell Images Through CellMap. ACTA ACUST UNITED AC 2021;69:e97. [PMID: 32150354 DOI: 10.1002/cpbi.97] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]

Bernhofer M, Dallago C, Karl T, Satagopam V, Heinzinger M, Littmann M, Olenyi T, Qiu J, Schütze K, Yachdav G, Ashkenazy H, Ben-Tal N, Bromberg Y, Goldberg T, Kajan L, O’Donoghue S, Sander C, Schafferhans A, Schlessinger A, Vriend G, Mirdita M, Gawron P, Gu W, Jarosz Y, Trefois C, Steinegger M, Schneider R, Rost B. PredictProtein - Predicting Protein Structure and Function for 29 Years. Nucleic Acids Res 2021;49:W535-W540. [PMID: 33999203 PMCID: PMC8265159 DOI: 10.1093/nar/gkab354] [Citation(s) in RCA: 103] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Revised: 04/06/2021] [Accepted: 05/10/2021] [Indexed: 12/12/2022] Open

Affiliation(s)

Michael Bernhofer TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
Christian Dallago TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
Tim Karl TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
Venkata Satagopam Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
Michael Heinzinger TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
Maria Littmann TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany TUM Graduate School CeDoSIA, Boltzmannstr 11, 85748 Garching, Germany
Tobias Olenyi TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
Jiajun Qiu TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany Department of Otolaryngology Head & Neck Surgery, The Ninth People's Hospital & Ear Institute, School of Medicine & Shanghai Key Laboratory of Translational Medicine on Ear and Nose Diseases, Shanghai Jiao Tong University, Shanghai, China
Konstantin Schütze TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
Guy Yachdav TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
Haim Ashkenazy Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tübingen, Germany The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
Nir Ben-Tal Department of Biochemistry & Molecular Biology, George S. Wise Faculty of Life Sciences, Tel Aviv University, 69978 Tel Aviv, Israel
Yana Bromberg Department of Biochemistry and Microbiology, Rutgers University, New Brunswick, NJ 08901, USA
Tatyana Goldberg TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany
Laszlo Kajan Roche Polska Sp. z o.o., Domaniewska 39B, 02–672 Warsaw, Poland
Sean O’Donoghue Garvan Institute of Medical Research, Sydney, Australia
Chris Sander Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA 02215, USA Department of Cell Biology, Harvard Medical School, Boston, MA 02215, USA Broad Institute of MIT and Harvard, Boston, MA 02142, USA
Andrea Schafferhans TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany HSWT (Hochschule Weihenstephan Triesdorf \| University of Applied Sciences), Department of Bioengineering Sciences, Am Hofgarten 10, 85354 Freising, Germany
Avner Schlessinger Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Gerrit Vriend BIPS, Poblacion Baco, Mindoro, Philippines
Milot Mirdita Quantitative and Computational Biology, Max Planck Institute for Biophysical Chemistry, Göttingen, Germany
Piotr Gawron Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
Wei Gu Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
Yohan Jarosz Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
Christophe Trefois Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
Martin Steinegger School of Biological Sciences, Seoul National University, Seoul, South Korea Artificial Intelligence Institute, Seoul National University, Seoul, South Korea
Reinhard Schneider Luxembourg Centre For Systems Biomedicine (LCSB), University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg ELIXIR Luxembourg (ELIXIR-LU) Node, University of Luxembourg, Campus Belval, House of Biomedicine II, 6 avenue du Swing, L-4367 Belvaux, Luxembourg
Burkhard Rost TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr 3, 85748 Garching/Munich, Germany Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748 Garching/Munich, Germany TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany

Collapse

Dallago C, Schütze K, Heinzinger M, Olenyi T, Littmann M, Lu AX, Yang KK, Min S, Yoon S, Morton JT, Rost B. Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets. Curr Protoc 2021;1:e113. [PMID: 33961736 DOI: 10.1002/cpz1.113] [Citation(s) in RCA: 32] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]

Abstract

Models from machine learning (ML) or artificial intelligence (AI) increasingly assist in guiding experimental design and decision making in molecular biology and medicine. Recently, Language Models (LMs) have been adapted from Natural Language Processing (NLP) to encode the implicit language written in protein sequences. Protein LMs show enormous potential in generating descriptive representations (embeddings) for proteins from just their sequences, in a fraction of the time with respect to previous approaches, yet with comparable or improved predictive ability. Researchers have trained a variety of protein LMs that are likely to illuminate different angles of the protein language. By leveraging the bio_embeddings pipeline and modules, simple and reproducible workflows can be laid out to generate protein embeddings and rich visualizations. Embeddings can then be leveraged as input features through machine learning libraries to develop methods predicting particular aspects of protein function and structure. Beyond the workflows included here, embeddings have been leveraged as proxies to traditional homology-based inference and even to align similar protein sequences. A wealth of possibilities remain for researchers to harness through the tools provided in the following protocols. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. The following protocols are included in this manuscript: Basic Protocol 1: Generic use of the bio_embeddings pipeline to plot protein sequences and annotations Basic Protocol 2: Generate embeddings from protein sequences using the bio_embeddings pipeline Basic Protocol 3: Overlay sequence annotations onto a protein space visualization Basic Protocol 4: Train a machine learning classifier on protein embeddings Alternate Protocol 1: Generate 3D instead of 2D visualizations Alternate Protocol 2: Visualize protein solubility instead of protein subcellular localization Support Protocol: Join embedding generation and sequence space visualization in a pipeline.

Collapse

Affiliation(s)

Christian Dallago TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Garching/Munich, Germany
Konstantin Schütze TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany
Michael Heinzinger TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Garching/Munich, Germany
Tobias Olenyi TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany
Maria Littmann TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany.,TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Garching/Munich, Germany
Amy X Lu Department of Computer Science, University of Toronto, Toronto, Canada & Vector Institute
Kevin K Yang Microsoft Research New England, Cambridge, Massachusetts
Seonwoo Min Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea
Sungroh Yoon Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea.,Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
James T Morton Center for Computational Biology, Flatiron Institute, New York, New York
Burkhard Rost TUM (Technical University of Munich) Department of Informatics, Bioinformatics & Computational Biology, Garching/Munich, Germany.,Institute for Advanced Study (TUM-IAS), Garching/Munich, Germany.,TUM School of Life Sciences Weihenstephan (WZW), Freising, Germany.,Columbia University, Department of Biochemistry and Molecular Biophysics, New York, New York.,New York Consortium on Membrane Protein Structure (NYCOMPS), New York, New York

Collapse

Littmann M, Bordin N, Heinzinger M, Schütze K, Dallago C, Orengo C, Rost B. Clustering FunFams using sequence embeddings improves EC purity. Bioinformatics 2021;37:3449-3455. [PMID: 33978744 PMCID: PMC8545299 DOI: 10.1093/bioinformatics/btab371] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/02/2021] [Accepted: 05/11/2021] [Indexed: 12/05/2022] Open

Abstract

Motivation

Classifying proteins into functional families can improve our understanding of protein function and can allow transferring annotations within one family. For this, functional families need to be ‘pure’, i.e., contain only proteins with identical function. Functional Families (FunFams) cluster proteins within CATH superfamilies into such groups of proteins sharing function. 11% of all FunFams (22 830 of 203 639) contain EC annotations and of those, 7% (1526 of 22 830) have inconsistent functional annotations.

Results

We propose an approach to further cluster FunFams into functionally more consistent sub-families by encoding their sequences through embeddings. These embeddings originate from language models transferring knowledge gained from predicting missing amino acids in a sequence (ProtBERT) and have been further optimized to distinguish between proteins belonging to the same or a different CATH superfamily (PB-Tucker). Using distances between embeddings and DBSCAN to cluster FunFams and identify outliers, doubled the number of pure clusters per FunFam compared to random clustering. Our approach was not limited to FunFams but also succeeded on families created using sequence similarity alone. Complementing EC annotations, we observed similar results for binding annotations. Thus, we expect an increased purity also for other aspects of function. Our results can help generating FunFams; the resulting clusters with improved functional consistency allow more reliable inference of annotations. We expect this approach to succeed equally for any other grouping of proteins by their phenotypes.

Availability and implementation

Code and embeddings are available via GitHub: https://github.com/Rostlab/FunFamsClustering.

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Littmann M, Heinzinger M, Dallago C, Olenyi T, Rost B. Embeddings from deep learning transfer GO annotations beyond homology. Sci Rep 2021;11:1160. [PMID: 33441905 PMCID: PMC7806674 DOI: 10.1038/s41598-020-80786-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 12/24/2020] [Indexed: 11/09/2022] Open

Qiu J, Nechaev D, Rost B. Protein-protein and protein-nucleic acid binding residues important for common and rare sequence variants in human. BMC Bioinformatics 2020;21:452. [PMID: 33050876 PMCID: PMC7557062 DOI: 10.1186/s12859-020-03759-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Accepted: 09/16/2020] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Any two unrelated people differ by about 20,000 missense mutations (also referred to as SAVs: Single Amino acid Variants or missense SNV). Many SAVs have been predicted to strongly affect molecular protein function. Common SAVs (> 5% of population) were predicted to have, on average, more effect on molecular protein function than rare SAVs (< 1% of population). We hypothesized that the prevalence of effect in common over rare SAVs might partially be caused by common SAVs more often occurring at interfaces of proteins with other proteins, DNA, or RNA, thereby creating subgroup-specific phenotypes. We analyzed SAVs from 60,706 people through the lens of two prediction methods, one (SNAP2) predicting the effects of SAVs on molecular protein function, the other (ProNA2020) predicting residues in DNA-, RNA- and protein-binding interfaces.

RESULTS

Three results stood out. Firstly, SAVs predicted to occur at binding interfaces were predicted to more likely affect molecular function than those predicted as not binding (p value < 2.2 × 10^-16). Secondly, for SAVs predicted to occur at binding interfaces, common SAVs were predicted more strongly with effect on protein function than rare SAVs (p value < 2.2 × 10^-16). Restriction to SAVs with experimental annotations confirmed all results, although the resulting subsets were too small to establish statistical significance for any result. Thirdly, the fraction of SAVs predicted at binding interfaces differed significantly between tissues, e.g. urinary bladder tissue was found abundant in SAVs predicted at protein-binding interfaces, and reproductive tissues (ovary, testis, vagina, seminal vesicle and endometrium) in SAVs predicted at DNA-binding interfaces.

CONCLUSIONS

Overall, the results suggested that residues at protein-, DNA-, and RNA-binding interfaces contributed toward predicting that common SAVs more likely affect molecular function than rare SAVs.

Collapse

Zaucha J, Heinzinger M, Kulandaisamy A, Kataka E, Salvádor ÓL, Popov P, Rost B, Gromiha MM, Zhorov BS, Frishman D. Mutations in transmembrane proteins: diseases, evolutionary insights, prediction and comparison with globular proteins. Brief Bioinform 2020;22:5872174. [PMID: 32672331 DOI: 10.1093/bib/bbaa132] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Revised: 05/26/2020] [Accepted: 05/28/2020] [Indexed: 12/18/2022] Open

Lai JS, Rost B, Kobe B, Bodén M. Evolutionary model of protein secondary structure capable of revealing new biological relationships. Proteins 2020;88:1251-1259. [PMID: 32394426 DOI: 10.1002/prot.25898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2019] [Revised: 10/24/2019] [Accepted: 04/27/2020] [Indexed: 11/09/2022]

Miller M, Vitale D, Kahn PC, Rost B, Bromberg Y. funtrp: identifying protein positions for variation driven functional tuning. Nucleic Acids Res 2020;47:e142. [PMID: 31584091 PMCID: PMC6868392 DOI: 10.1093/nar/gkz818] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 09/05/2019] [Accepted: 09/12/2019] [Indexed: 12/12/2022] Open

Reeb J, Wirth T, Rost B. Variant effect predictions capture some aspects of deep mutational scanning experiments. BMC Bioinformatics 2020;21:107. [PMID: 32183714 PMCID: PMC7077003 DOI: 10.1186/s12859-020-3439-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 03/03/2020] [Indexed: 12/12/2022] Open

Qiu J, Bernhofer M, Heinzinger M, Kemper S, Norambuena T, Melo F, Rost B. ProNA2020 predicts protein-DNA, protein-RNA, and protein-protein binding proteins and residues from sequence. J Mol Biol 2020;432:2428-2443. [PMID: 32142788 DOI: 10.1016/j.jmb.2020.02.026] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 02/17/2020] [Accepted: 02/23/2020] [Indexed: 11/29/2022]

Affiliation(s)

Jiajun Qiu Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany.
Michael Bernhofer Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
Michael Heinzinger Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; TUM Graduate School, Center of Doctoral Studies in Informatics and Its Applications (CeDoSIA), Garching, 85748, Germany
Sofie Kemper Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany
Tomas Norambuena Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile
Francisco Melo Molecular Bioinformatics Laboratory, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Santiago, Chile; Institute of Biological and Medical Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile
Burkhard Rost Department of Informatics, I12-Chair of Bioinformatics and Computational Biology, Technical University of Munich (TUM), Boltzmannstrasse 3, 85748, Garching, Munich, Germany; Columbia University, Department of Biochemistry and Molecular Biophysics, 701 West, 168th Street, New York, NY, 10032, USA; Institute of Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany; Germany & Institute for Food and Plant Sciences (WZW) Weihenstephan, Alte Akademie 8, 85354 Freising, Germany

Collapse

Zaucha J, Heinzinger M, Tarnovskaya S, Rost B, Frishman D. Family-specific analysis of variant pathogenicity prediction tools. NAR Genom Bioinform 2020;2:lqaa014. [PMID: 33575576 PMCID: PMC7671395 DOI: 10.1093/nargab/lqaa014] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2019] [Revised: 02/12/2020] [Accepted: 02/25/2020] [Indexed: 01/01/2023] Open

Littmann M, Selig K, Cohen-Lavi L, Frank Y, Hönigschmid P, Kataka E, Mösch A, Qian K, Ron A, Schmid S, Sorbie A, Szlak L, Dagan-Wiener A, Ben-Tal N, Niv MY, Razansky D, Schuller BW, Ankerst D, Hertz T, Rost B. Validity of machine learning in biology and medicine increased through collaborations across fields of expertise. NAT MACH INTELL 2020. [DOI: 10.1038/s42256-019-0139-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Littmann M, Goldberg T, Seitz S, Bodén M, Rost B. Correction to: Detailed prediction of protein sub-nuclear localization. BMC Bioinformatics 2019;20:727. [PMID: 31861997 PMCID: PMC6925513 DOI: 10.1186/s12859-019-3305-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open

Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics 2019;20:723. [PMID: 31847804 PMCID: PMC6918593 DOI: 10.1186/s12859-019-3220-8] [Citation(s) in RCA: 205] [Impact Index Per Article: 41.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/13/2019] [Indexed: 12/15/2022] Open

Abstract

BACKGROUND

Predicting protein function and structure from sequence is one important challenge for computational biology. For 26 years, most state-of-the-art approaches combined machine learning and evolutionary information. However, for some applications retrieving related proteins is becoming too time-consuming. Additionally, evolutionary information is less powerful for small families, e.g. for proteins from the Dark Proteome. Both these problems are addressed by the new methodology introduced here.

RESULTS

We introduced a novel way to represent protein sequences as continuous vectors (embeddings) by using the language model ELMo taken from natural language processing. By modeling protein sequences, ELMo effectively captured the biophysical properties of the language of life from unlabeled big data (UniRef50). We refer to these new embeddings as SeqVec (Sequence-to-Vector) and demonstrate their effectiveness by training simple neural networks for two different tasks. At the per-residue level, secondary structure (Q3 = 79% ± 1, Q8 = 68% ± 1) and regions with intrinsic disorder (MCC = 0.59 ± 0.03) were predicted significantly better than through one-hot encoding or through Word2vec-like approaches. At the per-protein level, subcellular localization was predicted in ten classes (Q10 = 68% ± 1) and membrane-bound were distinguished from water-soluble proteins (Q2 = 87% ± 1). Although SeqVec embeddings generated the best predictions from single sequences, no solution improved over the best existing method using evolutionary information. Nevertheless, our approach improved over some popular methods using evolutionary information and for some proteins even did beat the best. Thus, they prove to condense the underlying principles of protein sequences. Overall, the important novelty is speed: where the lightning-fast HHblits needed on average about two minutes to generate the evolutionary information for a target protein, SeqVec created embeddings on average in 0.03 s. As this speed-up is independent of the size of growing sequence databases, SeqVec provides a highly scalable approach for the analysis of big data in proteomics, i.e. microbiome or metaproteome analysis.

CONCLUSION

Transfer-learning succeeded to extract information from unlabeled sequence databases relevant for various protein prediction tasks. SeqVec modeled the language of life, namely the principles underlying protein sequences better than any features suggested by textbooks and prediction methods. The exception is evolutionary information, however, that information is not available on the level of a single sequence.

Collapse

Affiliation(s)

Michael Heinzinger Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany. TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
Ahmed Elnaggar Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Yu Wang Leibniz Supercomputing Centre, Boltzmannstr. 1, 85748, Garching/Munich, Germany
Christian Dallago Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Dmitrii Nechaev Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
Florian Matthes TUM Department of Informatics, Software Engineering and Business Information Systems, Boltzmannstr. 1, 85748, Garching/Munich, Germany
Burkhard Rost Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany Department of Biochemistry and Molecular Biophysics & New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, 701 West, 168th Street, New York, NY, 10032, USA

Collapse

Zhou N, Jiang Y, Bergquist TR, Lee AJ, Kacsoh BZ, Crocker AW, Lewis KA, Georghiou G, Nguyen HN, Hamid MN, Davis L, Dogan T, Atalay V, Rifaioglu AS, Dalkıran A, Cetin Atalay R, Zhang C, Hurto RL, Freddolino PL, Zhang Y, Bhat P, Supek F, Fernández JM, Gemovic B, Perovic VR, Davidović RS, Sumonja N, Veljkovic N, Asgari E, Mofrad MRK, Profiti G, Savojardo C, Martelli PL, Casadio R, Boecker F, Schoof H, Kahanda I, Thurlby N, McHardy AC, Renaux A, Saidi R, Gough J, Freitas AA, Antczak M, Fabris F, Wass MN, Hou J, Cheng J, Wang Z, Romero AE, Paccanaro A, Yang H, Goldberg T, Zhao C, Holm L, Törönen P, Medlar AJ, Zosa E, Borukhov I, Novikov I, Wilkins A, Lichtarge O, Chi PH, Tseng WC, Linial M, Rose PW, Dessimoz C, Vidulin V, Dzeroski S, Sillitoe I, Das S, Lees JG, Jones DT, Wan C, Cozzetto D, Fa R, Torres M, Warwick Vesztrocy A, Rodriguez JM, Tress ML, Frasca M, Notaro M, Grossi G, Petrini A, Re M, Valentini G, Mesiti M, Roche DB, Reeb J, Ritchie DW, Aridhi S, Alborzi SZ, Devignes MD, Koo DCE, Bonneau R, Gligorijević V, Barot M, Fang H, Toppo S, Lavezzo E, Falda M, Berselli M, Tosatto SCE, Carraro M, Piovesan D, Ur Rehman H, Mao Q, Zhang S, Vucetic S, Black GS, Jo D, Suh E, Dayton JB, Larsen DJ, Omdahl AR, McGuffin LJ, Brackenridge DA, Babbitt PC, Yunes JM, Fontana P, Zhang F, Zhu S, You R, Zhang Z, Dai S, Yao S, Tian W, Cao R, Chandler C, Amezola M, Johnson D, Chang JM, Liao WH, Liu YW, Pascarelli S, Frank Y, Hoehndorf R, Kulmanov M, Boudellioua I, Politano G, Di Carlo S, Benso A, Hakala K, Ginter F, Mehryary F, Kaewphan S, Björne J, Moen H, Tolvanen MEE, Salakoski T, Kihara D, Jain A, Šmuc T, Altenhoff A, Ben-Hur A, Rost B, Brenner SE, Orengo CA, Jeffery CJ, Bosco G, Hogan DA, Martin MJ, O'Donovan C, Mooney SD, Greene CS, Radivojac P, Friedberg I. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biol 2019;20:244. [PMID: 31744546 PMCID: PMC6864930 DOI: 10.1186/s13059-019-1835-8] [Citation(s) in RCA: 166] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 09/24/2019] [Indexed: 12/23/2022] Open

Affiliation(s)

Naihui Zhou Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, USA.,Program in Bioinformatics and Computational Biology, Ames, IA, USA
Yuxiang Jiang Indiana University Bloomington, Bloomington, Indiana, USA
Timothy R Bergquist Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
Alexandra J Lee Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA, USA
Balint Z Kacsoh Geisel School of Medicine at Dartmouth, Hanover, NH, USA.,Department of Molecular and Systems Biology, Hanover, NH, USA
Alex W Crocker Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
Kimberley A Lewis Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
George Georghiou European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom
Huy N Nguyen Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, USA.,Program in Computer Science, Ames, IA, USA
Md Nafiz Hamid Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, USA.,Program in Bioinformatics and Computational Biology, Ames, IA, USA
Larry Davis Program in Bioinformatics and Computational Biology, Ames, IA, USA
Tunca Dogan Department of Computer Engineering, Hacettepe University, Ankara, Turkey.,European Molecular Biolo gy Labora tory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
Volkan Atalay Department of Computer Engineering, Middle East Technical University (METU), Ankara, Turkey
Ahmet S Rifaioglu Department of Computer Engineering, Middle East Technical University (METU), Ankara, Turkey.,Department of Computer Engineering, Iskenderun Technical University, Hatay, Turkey
Alperen Dalkıran Department of Computer Engineering, Middle East Technical University (METU), Ankara, Turkey
Rengul Cetin Atalay CanSyL, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey
Chengxin Zhang Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
Rebecca L Hurto Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
Peter L Freddolino Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
Yang Zhang Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI, USA
Prajwal Bhat Achira Labs, Bangalore, India
Fran Supek Institute for Research in Biomedicine (IRB Barcelona), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
José M Fernández INB Coordination Unit, Life Sciences Department, Barcelona Supercomputing Center, Barcelona, Catalonia, Spain.,(former) INB GN2, Structural and Computational Biology Programme, Spanish National Cancer Research Centre, Barcelona, Catalonia, Spain
Branislava Gemovic Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade, Serbia
Vladimir R Perovic Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade, Serbia
Radoslav S Davidović Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade, Serbia
Neven Sumonja Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade, Serbia
Nevena Veljkovic Laboratory for Bioinformatics and Computational Chemistry, Institute of Nuclear Sciences VINCA, University of Belgrade, Belgrade, Serbia
Ehsaneddin Asgari Molecular Cell Biomechanics Laboratory, Departments of Bioengineering, University of California Berkeley, Berkeley, CA, USA.,Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Berkeley, CA, USA
Mohammad R K Mofrad Departments of Bioengineering and Mechanical Engineering, Berkeley, CA, USA
Giuseppe Profiti Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy.,National Research Council, IBIOM, Bologna, Italy
Castrense Savojardo Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
Pier Luigi Martelli Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
Rita Casadio Bologna Biocomputing Group, Department of Pharmacy and Biotechnology, University of Bologna, Bologna, Italy
Florian Boecker University of Bonn: INRES Crop Bioinformatics, Bonn, North Rhine-Westphalia, Germany
Heiko Schoof INRES Crop Bioinformatics, University of Bonn, Bonn, Germany
Indika Kahanda Gianforte School of Computing, Montana State University, Bozeman, Montana, USA
Natalie Thurlby University of Bristol, Computer Science, Bristol, Bristol, United Kingdom
Alice C McHardy Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Brunswick, Germany.,RESIST, DFG Cluster of Excellence 2155, Brunswick, Germany
Alexandre Renaux Interuniversity Institute of Bioinformatics in Brussels, Université libre de Bruxelles - Vrije Universiteit Brussel, Brussels, Belgium.,Machine Learning Group, Université libre de Bruxelles, Brussels, Belgium.,Artificial Intelligence lab, Vrije Universiteit Brussel, Brussels, Belgium
Rabie Saidi European Molecular Biolo gy Labora tory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
Julian Gough MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
Alex A Freitas University of Kent, School of Computing, Canterbury, United Kingdom
Magdalena Antczak School of Biosciences, University of Kent, Canterbury, Kent, United Kingdom
Fabio Fabris University of Kent, School of Computing, Canterbury, United Kingdom
Mark N Wass School of Biosciences, University of Kent, Canterbury, Kent, United Kingdom
Jie Hou University of Missouri, Computer Science, Columbia, Missouri, USA.,Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
Jianlin Cheng Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, USA
Zheng Wang University of Miami, Coral Gables, Florida, USA
Alfonso E Romero Centre for Systems and Synthetic Biology, Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
Alberto Paccanaro Centre for Systems and Synthetic Biology, Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
Haixuan Yang School of Mathematics, Statistics and Applied Mathematics, National University of Ireland, Galway, Galway, Ireland.,Technical University of Munich, Garching, Germany
Tatyana Goldberg Department of Informatics, Bioinformatics & Computational Biology-i12, Technische Universitat Munchen, Munich, Germany
Chenguang Zhao Faculty for Informatics, Garching, Germany.,Department for Bioinformatics and Computational Biology, Garching, Germany.,School of Computing Sciences and Computer Engineering, Hattiesburg, Mississippi, USA
Liisa Holm Institute of Biotechnology, Helsinki Institute of Life Sciences, University of Helsinki, Finland, Helsinki, Finland
Petri Törönen Institute of Biotechnology, Helsinki Institute of Life Sciences, University of Helsinki, Finland, Helsinki, Finland
Alan J Medlar Institute of Biotechnology, Helsinki Institute of Life Sciences, University of Helsinki, Finland, Helsinki, Finland
Elaine Zosa Institute of Biotechnology, University of Helsinki, Helsinki, Finland
Itamar Borukhov Compugen Ltd., Holon, Israel
Ilya Novikov Baylor College of Medicine, Department of Biochemistry and Molecular Biology, Houston, TX, USA
Angela Wilkins Baylor College of Medicine, Department of Molecular and Human Genetics, Houston, TX, USA
Olivier Lichtarge Baylor College of Medicine, Department of Molecular and Human Genetics, Houston, TX, USA
Po-Han Chi National TsingHua University, Hsinchu, Taiwan
Wei-Cheng Tseng Department of Electrical Engineering in National Tsing Hua University, Hsinchu City, Taiwan
Michal Linial The Hebrew University of Jerusalem, Jerusalem, Israel
Peter W Rose University of California San Diego, San Diego Supercomputer Center, La Jolla, California, USA
Christophe Dessimoz Department of Computational Biology and Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland.,Department of Genetics, Evolution & Environment, and Department of Computer Science, University College London, London, UK.,Swiss Institute of Bioinformatics, Lausanne, Switzerland
Vedrana Vidulin Department of Knowledge Technologies, Jozef Stefan Institute, Ljubljana, Slovenia
Saso Dzeroski Jozef Stefan Institute, Ljubljana, Slovenia.,Jozef Stefan International Postgraduate School, Ljubljana, Slovenia
Ian Sillitoe Research Department of Structural and Molecular Biology, University College London, London, England
Sayoni Das Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
Jonathan Gill Lees Research Department of Structural and Molecular Biology, University College London, London, United Kingdom.,Department of Health and Life Sciences, Oxford Brookes University, London, UK
David T Jones The Francis Crick Institute, Biomedical Data Science Laboratory, London, United Kingdom.,Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, United Kingdom
Cen Wan Department of Computer Science, University College London, London, United Kingdom.,The Francis Crick Institute, Biomedical Data Science Laboratory, London, United Kingdom
Domenico Cozzetto Department of Computer Science, University College London, London, United Kingdom.,The Francis Crick Institute, Biomedical Data Science Laboratory, London, United Kingdom
Rui Fa Department of Computer Science, University College London, London, United Kingdom.,The Francis Crick Institute, Biomedical Data Science Laboratory, London, United Kingdom
Mateo Torres Centre for Systems and Synthetic Biology, Department of Computer Science, Royal Holloway, University of London, Egham, Surrey, United Kingdom
Alex Warwick Vesztrocy Department of Genetics, Evolution and Environment, University College London, Gower Street, London, WC1E 6BT, United Kingdom.,SIB Swiss Institute of Bioinformatics, Lausanne, 1015, Switzerland
Jose Manuel Rodriguez Cardiovascular Proteomics Laboratory, Centro Nacional de Investigaciones Cardiovasculares Carlos III (CNIC), Madrid, Spain
Michael L Tress Spanish National Cancer Research Centre (CNIO), Madrid, Spain
Marco Frasca Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy
Marco Notaro Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy
Giuliano Grossi Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy
Alessandro Petrini Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy
Matteo Re Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy
Giorgio Valentini Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy
Marco Mesiti Università degli Studi di Milano - Computer Science Department - AnacletoLab, Milan, Milan, Italy.,Institut de Biologie Computationnelle, LIRMM, CNRS-UMR 5506, Universite de Montpellier, Montpellier, France
Daniel B Roche Department of Informatics, Bioinformatics and Computational Biology-i12, Technische Universitat Munchen, Munich, Germany
Jonas Reeb Department of Informatics, Bioinformatics and Computational Biology-i12, Technische Universitat Munchen, Munich, Germany
David W Ritchie University of Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France
Sabeur Aridhi University of Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France
Seyed Ziaeddin Alborzi University of Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France.,Inria, Nancy, France
Marie-Dominique Devignes University of Lorraine, CNRS, Inria, LORIA, Nancy, 54000, France.,University of Lorraine, Nancy, Lorraine, France.,Inria, Nancy, France
Da Chen Emily Koo Department of Biology, New York University, New York, NY, USA
Richard Bonneau NYU Center for Data Science, New York, 10010, NY, USA.,Flatiron Institute, CCB, New York, 10010, NY, USA
Vladimir Gligorijević Center for Computational Biology (CCB), Flatiron Institute, Simons Foundation, New York, New York, USA
Meet Barot Center for Data Science, New York University, New York, 10011, NY, USA
Hai Fang Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
Stefano Toppo Department of Molecular Medicine, University of Padova, Padova, Italy
Enrico Lavezzo Department of Molecular Medicine, University of Padova, Padova, Italy
Marco Falda Department of Biology, University of Padova, Padova, Italy
Michele Berselli Department of Molecular Medicine, University of Padova, Padova, Italy
Silvio C E Tosatto CNR Institute of Neuroscience, Padova, Italy.,Department of Biomedical Sciences, University of Padua, Padova, Italy
Marco Carraro Department of Biomedical Sciences, University of Padua, Padova, Italy
Damiano Piovesan Department of Biomedical Sciences, University of Padua, Padova, Italy
Hafeez Ur Rehman Department of Computer Science, National University of Computer and Emerging Sciences, Peshawar, Khyber Pakhtoonkhwa, Pakistan
Qizhong Mao Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA.,University of California, Riverside, Philadelphia, PA, USA
Shanshan Zhang Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
Slobodan Vucetic Department of Computer and Information Sciences, Temple University, Philadelphia, PA, USA
Gage S Black Department of Biology, Brigham Young University, Provo, UT, USA.,Bioinformatics Research Group, Provo, UT, USA
Dane Jo Department of Biology, Brigham Young University, Provo, UT, USA.,Bioinformatics Research Group, Provo, UT, USA
Erica Suh Department of Biology, Brigham Young University, Provo, UT, USA
Jonathan B Dayton Department of Biology, Brigham Young University, Provo, UT, USA.,Bioinformatics Research Group, Provo, UT, USA
Dallas J Larsen Department of Biology, Brigham Young University, Provo, UT, USA.,Bioinformatics Research Group, Provo, UT, USA
Ashton R Omdahl Department of Biology, Brigham Young University, Provo, UT, USA.,Bioinformatics Research Group, Provo, UT, USA
Liam J McGuffin School of Biological Sciences, University of Reading, Reading, England, United Kingdom
Danielle A Brackenridge School of Biological Sciences, University of Reading, Reading, England, United Kingdom
Patricia C Babbitt Department of Pharmaceutical Chemistry, San Francisco, CA, USA.,Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, 94158, CA, USA
Jeffrey M Yunes UC Berkeley - UCSF Graduate Program in Bioengineering, University of California, San Francisco, 94158, CA, USA.,Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, 94158, CA, USA
Paolo Fontana Research and Innovation Center, Edmund Mach Foundation, San Michele all'Adige, Italy
Feng Zhang State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, Fudan University, Shanghai, Shanghai, China.,Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, Shanghai, China
Shanfeng Zhu School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.,Institute of Science and Technology for Brain-Inspired Intelligence and Shanghai Institute of Artificial Intelligence Algorithms, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
Ronghui You School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.,Institute of Science and Technology for Brain-Inspired Intelligence and Shanghai Institute of Artificial Intelligence Algorithms, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
Zihan Zhang School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
Suyang Dai School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
Shuwei Yao School of Computer Science and Shanghai Key Lab of Intelligent Information Processing, Fudan University, Shanghai, China.,Institute of Science and Technology for Brain-Inspired Intelligence and Shanghai Institute of Artificial Intelligence Algorithms, Fudan University, Shanghai, China
Weidong Tian State Key Laboratory of Genetic Engineering and Collaborative Innovation Center for Genetics and Development, Department of Biostatistics and Computational Biology, School of Life Sciences, Fudan University, Shanghai, Shanghai, China.,Department of Pediatrics, Brain Tumor Center, Division of Experimental Hematology and Cancer Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA
Renzhi Cao Department of Computer Science, Pacific Lutheran University, Tacoma, WA, USA
Caleb Chandler Department of Computer Science, Pacific Lutheran University, Tacoma, WA, USA
Miguel Amezola Department of Computer Science, Pacific Lutheran University, Tacoma, WA, USA
Devon Johnson Department of Computer Science, Pacific Lutheran University, Tacoma, WA, USA
Jia-Ming Chang Department of Computer Science, National Chengchi University, Taipei, Taiwan
Wen-Hung Liao Department of Computer Science, National Chengchi University, Taipei, Taiwan
Yi-Wei Liu Department of Computer Science, National Chengchi University, Taipei, Taiwan
Stefano Pascarelli Okinawa Institute of Science and Technology, Tancha, Okinawa, Japan
Yotam Frank Tel Aviv University, Tel Aviv, Israel
Robert Hoehndorf Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Jeddah, Saudi Arabia
Maxat Kulmanov Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Jeddah, Saudi Arabia
Imane Boudellioua Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.,Computer, Electrical and Mathematical Sciences Engineering Division (CEMSE), King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Gianfranco Politano Control and Computer Engineering Department, Politecnico di Torino, Torino, TO, Italy
Stefano Di Carlo Control and Computer Engineering Department, Politecnico di Torino, Torino, TO, Italy
Alfredo Benso Control and Computer Engineering Department, Politecnico di Torino, Torino, TO, Italy
Kai Hakala Department of Future Technologies, Turku NLP Group, University of Turku, Turku, Finland.,University of Turku Graduate School (UTUGS), Turku, Finland
Filip Ginter Department of Future Technologies, Turku NLP Group, University of Turku, Turku, Finland.,University of Turku, Turku, Finland
Farrokh Mehryary Department of Future Technologies, Turku NLP Group, University of Turku, Turku, Finland.,University of Turku Graduate School (UTUGS), Turku, Finland
Suwisa Kaewphan Department of Future Technologies, Turku NLP Group, University of Turku, Turku, Finland.,University of Turku Graduate School (UTUGS), Turku, Finland.,Turku Centre for Computer Science (TUCS), Turku, Finland
Jari Björne Department of Future Technologies, Faculty of Science and Engineering, University of Turku, Turku, FI-20014, Finland.,Turku Centre for Computer Science (TUCS), Agora, Vesilinnantie 3, Turku, FI-20500, Finland
Hans Moen University of Turku, Turku, Finland
Martti E E Tolvanen Department of Future Technologies, University of Turku, Turku, Finland
Tapio Salakoski Department of Future Technologies, Faculty of Science and Engineering, University of Turku, Turku, FI-20014, Finland.,Turku Centre for Computer Science (TUCS), Agora, Vesilinnantie 3, Turku, FI-20500, Finland
Daisuke Kihara Department of Biological Sciences, Department of Computer Science, Purdue University, 47907, IN, USA.,Department of Pediatrics, University of Cincinnati, Cincinnati, 45229, OH, USA
Aashish Jain Department of Computer Science, Purdue University, West Lafayette, IN, USA
Tomislav Šmuc Division of Electronics, Rudjer Boskovic Institute, Zagreb, Croatia
Adrian Altenhoff Department of Computer Science, ETH Zurich, Zurich, Switzerland.,SIB Swiss Institute of Bioinformatics, Zurich, Switzerland
Asa Ben-Hur Department of Computer Science, Colorado State University, Fort Collins, CO, USA
Burkhard Rost Department of Informatics, Bioinformatics & Computational Biology-i12, Technische Universitat Munchen, Munich, Germany.,Institute for Food and Plant Sciences WZW, Technische Universität München, Freising, Germany
Steven E Brenner University of California, Berkeley, CA, USA
Christine A Orengo Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
Constance J Jeffery Biological Sciences, University of Illinois at Chicago, Chicago, Illinois, USA
Giovanni Bosco Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
Deborah A Hogan Geisel School of Medicine at Dartmouth, Hanover, NH, USA.,Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
Maria J Martin European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom
Claire O'Donovan European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, United Kingdom
Sean D Mooney Department of Biomedical Informatics and Medical Education, University of Washington, Seattle, WA, USA
Casey S Greene Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.,Childhood Cancer Data Lab, Alex's Lemonade Stand Foundation, Philadelphia, Pennsylvania, USA
Predrag Radivojac Khoury College of Computer Sciences, Northeastern University, Boston, MA, USA.
Iddo Friedberg Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, USA.

Collapse

Mahlich Y, Steinegger M, Rost B, Bromberg Y. HFSP: high speed homology-driven function annotation of proteins. Bioinformatics 2019;34:i304-i312. [PMID: 29950013 PMCID: PMC6022561 DOI: 10.1093/bioinformatics/bty262] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Marot-Lassauzaie V, Bernhofer M, Rost B. Correcting mistakes in predicting distributions. Bioinformatics 2019;34:3385-3386. [PMID: 29762646 PMCID: PMC6157078 DOI: 10.1093/bioinformatics/bty346] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Accepted: 05/08/2018] [Indexed: 11/25/2022] Open

Schafferhans A, O'Donoghue SI, Heinzinger M, Rost B. Dark Proteins Important for Cellular Function. Proteomics 2019;18:e1800227. [PMID: 30318701 DOI: 10.1002/pmic.201800227] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2018] [Revised: 09/14/2018] [Indexed: 01/08/2023]

Bernhofer M, Goldberg T, Wolf S, Ahmed M, Zaugg J, Boden M, Rost B. NLSdb-major update for database of nuclear localization signals and nuclear export signals. Nucleic Acids Res 2019;46:D503-D508. [PMID: 29106588 PMCID: PMC5753228 DOI: 10.1093/nar/gkx1021] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/18/2017] [Indexed: 11/13/2022] Open

Scheibenreif L, Littmann M, Orengo C, Rost B. FunFam protein families improve residue level molecular function prediction. BMC Bioinformatics 2019;20:400. [PMID: 31319797 PMCID: PMC6639920 DOI: 10.1186/s12859-019-2988-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 07/09/2019] [Indexed: 01/16/2023] Open

Peeken JC, Bernhofer M, Spraker MB, Pfeiffer D, Devecka M, Thamer A, Shouman MA, Ott A, Nüsslin F, Mayr NA, Rost B, Nyflot MJ, Combs SE. CT-based radiomic features predict tumor grading and have prognostic value in patients with soft tissue sarcomas treated with neoadjuvant radiation therapy. Radiother Oncol 2019;135:187-196. [DOI: 10.1016/j.radonc.2019.01.004] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2018] [Revised: 12/19/2018] [Accepted: 01/05/2019] [Indexed: 01/01/2023]

Littmann M, Goldberg T, Seitz S, Bodén M, Rost B. Detailed prediction of protein sub-nuclear localization. BMC Bioinformatics 2019;20:205. [PMID: 31014229 PMCID: PMC6480651 DOI: 10.1186/s12859-019-2790-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2018] [Accepted: 04/02/2019] [Indexed: 12/21/2022] Open

Abstract

Background

Sub-nuclear structures or locations are associated with various nuclear processes. Proteins localized in these substructures are important to understand the interior nuclear mechanisms. Despite advances in high-throughput methods, experimental protein annotations remain limited. Predictions of cellular compartments have become very accurate, largely at the expense of leaving out substructures inside the nucleus making a fine-grained analysis impossible.

Results

Here, we present a new method (LocNuclei) that predicts nuclear substructures from sequence alone. LocNuclei used a string-based Profile Kernel with Support Vector Machines (SVMs). It distinguishes sub-nuclear localization in 13 distinct substructures and distinguishes between nuclear proteins confined to the nucleus and those that are also native to other compartments (traveler proteins). High performance was achieved by implicitly leveraging a large biological knowledge-base in creating predictions by homology-based inference through BLAST. Using this approach, the performance reached AUC = 0.70–0.74 and Q13 = 59–65%. Travelling proteins (nucleus and other) were identified at Q2 = 70–74%. A Gene Ontology (GO) analysis of the enrichment of biological processes revealed that the predicted sub-nuclear compartments matched the expected functionality. Analysis of protein-protein interactions (PPI) show that formation of compartments and functionality of proteins in these compartments highly rely on interactions between proteins. This suggested that the LocNuclei predictions carry important information about function. The source code and data sets are available through GitHub: https://github.com/Rostlab/LocNuclei.

Conclusions

LocNuclei predicts subnuclear compartments and traveler proteins accurately. These predictions carry important information about functionality and PPIs.

Electronic supplementary material

The online version of this article (10.1186/s12859-019-2790-9) contains supplementary material, which is available to authorized users.

Collapse

Peeken JC, Goldberg T, Pyka T, Bernhofer M, Wiestler B, Kessel KA, Tafti PD, Nüsslin F, Braun AE, Zimmer C, Rost B, Combs SE. Combining multimodal imaging and treatment features improves machine learning-based prognostic assessment in patients with glioblastoma multiforme. Cancer Med 2018;8:128-136. [PMID: 30561851 PMCID: PMC6346243 DOI: 10.1002/cam4.1908] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2018] [Revised: 11/14/2018] [Accepted: 11/14/2018] [Indexed: 12/22/2022] Open

Abstract

Background

For Glioblastoma (GBM), various prognostic nomograms have been proposed. This study aims to evaluate machine learning models to predict patients' overall survival (OS) and progression‐free survival (PFS) on the basis of clinical, pathological, semantic MRI‐based, and FET‐PET/CT‐derived information. Finally, the value of adding treatment features was evaluated.

Methods

One hundred and eighty‐nine patients were retrospectively analyzed. We assessed clinical, pathological, and treatment information. The VASARI set of semantic imaging features was determined on MRIs. Metabolic information was retained from preoperative FET‐PET/CT images. We generated multiple random survival forest prediction models on a patient training set and performed internal validation. Single feature class models were created including "clinical," "pathological," "MRI‐based," and "FET‐PET/CT‐based" models, as well as combinations. Treatment features were combined with all other features.

Results

Of all single feature class models, the MRI‐based model had the highest prediction performance on the validation set for OS (C‐index: 0.61 [95% confidence interval: 0.51‐0.72]) and PFS (C‐index: 0.61 [0.50‐0.72]). The combination of all features did increase performance above all single feature class models up to C‐indices of 0.70 (0.59‐0.84) and 0.68 (0.57‐0.78) for OS and PFS, respectively. Adding treatment information further increased prognostic performance up to C‐indices of 0.73 (0.62‐0.84) and 0.71 (0.60‐0.81) on the validation set for OS and PFS, respectively, allowing significant stratification of patient groups for OS.

Conclusions

MRI‐based features were the most relevant feature class for prognostic assessment. Combining clinical, pathological, and imaging information increased predictive power for OS and PFS. A further increase was achieved by adding treatment features.

Collapse

Affiliation(s)

Jan C Peeken Department of Radiation Oncology, Klinikum rechts der Isar der Technischem Universität München (TUM), München, Germany.,Deutsches Konsortium für Translationale Krebsforschung (DKTK), Partner Site Munich, Munich, Germany.,Department of Radiation Sciences (DRS), Institute of Innovative Radiotherapy (iRT), Helmholtz Zentrum München, Neuherberg, Germany
Tatyana Goldberg Allianz SE, Munich, Germany
Thomas Pyka Department of Nuclear Medicine, Klinikum rechts der Isar der Technischen Universität München (TUM), Munich, Germany
Michael Bernhofer Department for Bioinformatics and Computational Biology, Technical University of Munich (TUM), Garching, Germany
Benedikt Wiestler Department of Neuroradiology, Klinikum rechts der Isar der Technischen Universität, Munich (TUM), München, Germany
Kerstin A Kessel Department of Radiation Oncology, Klinikum rechts der Isar der Technischem Universität München (TUM), München, Germany.,Deutsches Konsortium für Translationale Krebsforschung (DKTK), Partner Site Munich, Munich, Germany.,Department of Radiation Sciences (DRS), Institute of Innovative Radiotherapy (iRT), Helmholtz Zentrum München, Neuherberg, Germany
Pouya D Tafti Allianz SE, Munich, Germany
Fridtjof Nüsslin Department of Radiation Oncology, Klinikum rechts der Isar der Technischem Universität München (TUM), München, Germany
Andreas E Braun Allianz SE, Munich, Germany
Claus Zimmer Department of Neuroradiology, Klinikum rechts der Isar der Technischen Universität, Munich (TUM), München, Germany
Burkhard Rost Department of Nuclear Medicine, Klinikum rechts der Isar der Technischen Universität München (TUM), Munich, Germany
Stephanie E Combs Department of Radiation Oncology, Klinikum rechts der Isar der Technischem Universität München (TUM), München, Germany.,Deutsches Konsortium für Translationale Krebsforschung (DKTK), Partner Site Munich, Munich, Germany.,Department of Radiation Sciences (DRS), Institute of Innovative Radiotherapy (iRT), Helmholtz Zentrum München, Neuherberg, Germany

Collapse

Schelling M, Hopf TA, Rost B. Evolutionary couplings and sequence variation effect predict protein binding sites. Proteins 2018;86:1064-1074. [DOI: 10.1002/prot.25585] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2017] [Revised: 06/14/2018] [Accepted: 07/04/2018] [Indexed: 01/16/2023]

Tran L, Hamp T, Rost B. ProfPPIdb: Pairs of physical protein-protein interactions predicted for entire proteomes. PLoS One 2018;13:e0199988. [PMID: 30020956 PMCID: PMC6051629 DOI: 10.1371/journal.pone.0199988] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Accepted: 06/17/2018] [Indexed: 01/05/2023] Open

Abstract

MOTIVATION

Protein-protein interactions (PPIs) play a key role in many cellular processes. Most annotations of PPIs mix experimental and computational data. The mix optimizes coverage, but obfuscates the annotation origin. Some resources excel at focusing on reliable experimental data. Here, we focused on new pairs of interacting proteins for several model organisms based solely on sequence-based prediction methods.

RESULTS

We extracted reliable experimental data about which proteins interact (binary) for eight diverse model organisms from public databases, namely from Escherichia coli, Schizosaccharomyces pombe, Plasmodium falciparum, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus, Rattus norvegicus, Arabidopsis thaliana, and for the previously used Homo sapiens and Saccharomyces cerevisiae. Those data were the base to develop a PPI prediction method for each model organism. The method used evolutionary information through a profile-kernel Support Vector Machine (SVM). With the resulting eight models, we predicted all possible protein pairs in each organism and made the top predictions available through a web application. Almost all of the PPIs made available were predicted between proteins that have not been observed in any interaction, in particular for less well-studied organisms. Thus, our work complements existing resources and is particularly helpful for designing experiments because of its uniqueness. Experimental annotations and computational predictions are strongly influenced by the fact that some proteins have many partners and others few. To optimize machine learning, recent methods explicitly ignored such a network-structure and rely either on domain knowledge or sequence-only methods. Our approach is independent of domain-knowledge and leverages evolutionary information. The database interface representing our results is accessible from https://rostlab.org/services/ppipair/. The data can also be downloaded from https://figshare.com/collections/ProfPPI-DB/4141784.

Collapse