1
|
Zhao N, Wu T, Wang W, Zhang L, Gong X. Review and Comparative Analysis of Methods and Advancements in Predicting Protein Complex Structure. Interdiscip Sci 2024; 16:261-288. [PMID: 38955920 DOI: 10.1007/s12539-024-00626-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2023] [Revised: 02/29/2024] [Accepted: 03/01/2024] [Indexed: 07/04/2024]
Abstract
Protein complexes perform diverse biological functions, and obtaining their three-dimensional structure is critical to understanding and grasping their functions. In many cases, it's not just two proteins interacting to form a dimer; instead, multiple proteins interact to form a multimer. Experimentally resolving protein complex structures can be quite challenging. Recently, there have been efforts and methods that build upon prior predictions of dimer structures to attempt to predict multimer structures. However, in comparison to monomeric protein structure prediction, the accuracy of protein complex structure prediction remains relatively low. This paper provides an overview of recent advancements in efficient computational models for predicting protein complex structures. We introduce protein-protein docking methods in detail and summarize their main ideas, applicable modes, and related information. To enhance prediction accuracy, other critical protein-related information is also integrated, such as predicting interchain residue contact, utilizing experimental data like cryo-EM experiments, and considering protein interactions and non-interactions. In addition, we comprehensively review computational approaches for end-to-end prediction of protein complex structures based on artificial intelligence (AI) technology and describe commonly used datasets and representative evaluation metrics in protein complexes. Finally, we analyze the formidable challenges faced in current protein complex structure prediction tasks, including the structure prediction of heteromeric complex, disordered regions in complex, antibody-antigen complex, and RNA-related complex, as well as the evaluation metrics for complex assessment. We hope that this work will provide comprehensive knowledge of complex structure predictions to contribute to future advanced predictions.
Collapse
Affiliation(s)
- Nan Zhao
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Tong Wu
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Wenda Wang
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China
- School of Mathematics, Renmin University of China, Beijing, 100872, China
| | - Lunchuan Zhang
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
| | - Xinqi Gong
- Institute for Mathematical Sciences, Renmin University of China, Beijing, 100872, China.
- School of Mathematics, Renmin University of China, Beijing, 100872, China.
- Beijing Academy of Artificial Intelligence, Beijing, 100084, China.
| |
Collapse
|
2
|
Yang Q, Jin X, Zhou H, Ying J, Zou J, Liao Y, Lu X, Ge S, Yu H, Min X. SurfPro-NN: A 3D point cloud neural network for the scoring of protein-protein docking models based on surfaces features and protein language models. Comput Biol Chem 2024; 110:108067. [PMID: 38714420 DOI: 10.1016/j.compbiolchem.2024.108067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Revised: 03/18/2024] [Accepted: 04/01/2024] [Indexed: 05/09/2024]
Abstract
Protein-protein interactions (PPI) play a crucial role in numerous key biological processes, and the structure of protein complexes provides valuable clues for in-depth exploration of molecular-level biological processes. Protein-protein docking technology is widely used to simulate the spatial structure of proteins. However, there are still challenges in selecting candidate decoys that closely resemble the native structure from protein-protein docking simulations. In this study, we introduce a docking evaluation method based on three-dimensional point cloud neural networks named SurfPro-NN, which represents protein structures as point clouds and learns interaction information from protein interfaces by applying a point cloud neural network. With the continuous advancement of deep learning in the field of biology, a series of knowledge-rich pre-trained models have emerged. We incorporate protein surface representation models and language models into our approach, greatly enhancing feature representation capabilities and achieving superior performance in protein docking model scoring tasks. Through comprehensive testing on public datasets, we find that our method outperforms state-of-the-art deep learning approaches in protein-protein docking model scoring. Not only does it significantly improve performance, but it also greatly accelerates training speed. This study demonstrates the potential of our approach in addressing protein interaction assessment problems, providing strong support for future research and applications in the field of biology.
Collapse
Affiliation(s)
- Qianli Yang
- Institute of Artifical Intelligence, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China.
| | - Xiaocheng Jin
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Haixia Zhou
- School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Junjie Ying
- Institute of Artifical Intelligence, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - JiaJun Zou
- School of Informatics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Yiyang Liao
- School of Informatics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Xiaoli Lu
- Information and Networking Center, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Shengxiang Ge
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China
| | - Hai Yu
- National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; School of Public Health, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China.
| | - Xiaoping Min
- School of Informatics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; National Institute of Diagnostics and Vaccine Development in Infectious Diseases, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China; State Key Laboratory of Molecular Vaccinology and Molecular Diagnostics, XiaMen University, No. 422, Siming South Road, XiaMen, 361005, Fujian, China.
| |
Collapse
|
3
|
Chen X, Liu J, Park N, Cheng J. A Survey of Deep Learning Methods for Estimating the Accuracy of Protein Quaternary Structure Models. Biomolecules 2024; 14:574. [PMID: 38785981 PMCID: PMC11117562 DOI: 10.3390/biom14050574] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2024] [Revised: 04/07/2024] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
The quality prediction of quaternary structure models of a protein complex, in the absence of its true structure, is known as the Estimation of Model Accuracy (EMA). EMA is useful for ranking predicted protein complex structures and using them appropriately in biomedical research, such as protein-protein interaction studies, protein design, and drug discovery. With the advent of more accurate protein complex (multimer) prediction tools, such as AlphaFold2-Multimer and ESMFold, the estimation of the accuracy of protein complex structures has attracted increasing attention. Many deep learning methods have been developed to tackle this problem; however, there is a noticeable absence of a comprehensive overview of these methods to facilitate future development. Addressing this gap, we present a review of deep learning EMA methods for protein complex structures developed in the past several years, analyzing their methodologies, data and feature construction. We also provide a prospective summary of some potential new developments for further improving the accuracy of the EMA methods.
Collapse
Affiliation(s)
- Xiao Chen
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| | - Nolan Park
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
- NextGen Precision Health Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
4
|
Ovek D, Keskin O, Gursoy A. ProInterVal: Validation of Protein-Protein Interfaces through Learned Interface Representations. J Chem Inf Model 2024; 64:2979-2987. [PMID: 38526504 PMCID: PMC11040718 DOI: 10.1021/acs.jcim.3c01788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 02/21/2024] [Accepted: 02/22/2024] [Indexed: 03/26/2024]
Abstract
Proteins are vital components of the biological world and serve a multitude of functions. They interact with other molecules through their interfaces and participate in crucial cellular processes. Disruption of these interactions can have negative effects on organisms, highlighting the importance of studying protein-protein interfaces for developing targeted therapies for diseases. Therefore, the development of a reliable method for investigating protein-protein interactions is of paramount importance. In this work, we present an approach for validating protein-protein interfaces using learned interface representations. The approach involves using a graph-based contrastive autoencoder architecture and a transformer to learn representations of protein-protein interaction interfaces from unlabeled data and then validating them through learned representations with a graph neural network. Our method achieves an accuracy of 0.91 for the test set, outperforming existing GNN-based methods. We demonstrate the effectiveness of our approach on a benchmark data set and show that it provides a promising solution for validating protein-protein interfaces.
Collapse
Affiliation(s)
- Damla Ovek
- KUIS
AI Center, Koç University, Istanbul 34450, Turkey
- Computer
Engineering, Koç University, Istanbul 34450, Turkey
| | - Ozlem Keskin
- Chemical
and Biological Engineering, Koç University, Istanbul 34450, Turkey
| | - Attila Gursoy
- Computer
Engineering, Koç University, Istanbul 34450, Turkey
| |
Collapse
|
5
|
Kuder KJ. Docking Foundations: From Rigid to Flexible Docking. Methods Mol Biol 2024; 2780:3-14. [PMID: 38987460 DOI: 10.1007/978-1-0716-3985-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Despite the development of methods for the experimental determination of protein structures, the dissonance between the number of known sequences and their solved structures is still enormous. This is particularly evident in protein-protein complexes. To fill this gap, diverse technologies have been developed to study protein-protein interactions (PPIs) in a cellular context including a range of biological and computational methods. The latter derive from techniques originally published and applied almost half a century ago and are based on interdisciplinary knowledge from the nexus of the fields of biology, chemistry, and physics about protein sequences, structures, and their folding. Protein-protein docking, the main protagonist of this chapter, is routinely treated as an integral part of protein research. Herein, we describe the basic foundations of the whole process in general terms, but step by step from protein representations through docking methods and evaluation of complexes to their final validation.
Collapse
Affiliation(s)
- Kamil J Kuder
- Department of Technology and Biotechnology of Drugs, Faculty of Pharmacy, Jagiellonian University Medical College, Kraków, Poland.
| |
Collapse
|
6
|
Krupa MA, Krupa P. Free-Docking and Template-Based Docking: Physics Versus Knowledge-Based Docking. Methods Mol Biol 2024; 2780:27-41. [PMID: 38987462 DOI: 10.1007/978-1-0716-3985-6_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/12/2024]
Abstract
Docking methods can be used to predict the orientations of two or more molecules with respect of each other using a plethora of various algorithms, which can be based on the physics of interactions or can use information from databases and templates. The usability of these approaches depends on the type and size of the molecules, whose relative orientation will be estimated. The two most important limitations are (i) the computational cost of the prediction and (ii) the availability of the structural information for similar complexes. In general, if there is enough information about similar systems, knowledge-based and template-based methods can significantly reduce the computational cost while providing high accuracy of the prediction. However, if the information about the system topology and interactions between its partners is scarce, physics-based methods are more reliable or even the only choice. In this chapter, knowledge-, template-, and physics-based methods will be compared and briefly discussed providing examples of their usability with a special emphasis on physics-based protein-protein, protein-peptide, and protein-fullerene docking in the UNRES coarse-grained model.
Collapse
Affiliation(s)
- Magdalena A Krupa
- Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland
| | - Paweł Krupa
- Institute of Physics, Polish Academy of Sciences, Warsaw, Poland.
| |
Collapse
|
7
|
Chen Z, Liu N, Huang Y, Min X, Zeng X, Ge S, Zhang J, Xia N. PointDE: Protein Docking Evaluation Using 3D Point Cloud Neural Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3128-3138. [PMID: 37220029 DOI: 10.1109/tcbb.2023.3279019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Protein-protein interactions (PPIs) play essential roles in many vital movements and the determination of protein complex structure is helpful to discover the mechanism of PPI. Protein-protein docking is being developed to model the structure of the protein. However, there is still a challenge to selecting the near-native decoys generated by protein-protein docking. Here, we propose a docking evaluation method using 3D point cloud neural network named PointDE. PointDE transforms protein structure to the point cloud. Using the state-of-the-art point cloud network architecture and a novel grouping mechanism, PointDE can capture the geometries of the point cloud and learn the interaction information from the protein interface. On public datasets, PointDE surpasses the state-of-the-art method using deep learning. To further explore the ability of our method in different types of protein structures, we developed a new dataset generated by high-quality antibody-antigen complexes. The result in this antibody-antigen dataset shows the strong performance of PointDE, which will be helpful for the understanding of PPI mechanisms.
Collapse
|
8
|
Sunny S, Prakash PB, Gopakumar G, Jayaraj PB. DeepBindPPI: Protein-Protein Binding Site Prediction Using Attention Based Graph Convolutional Network. Protein J 2023; 42:276-287. [PMID: 37198346 PMCID: PMC10191823 DOI: 10.1007/s10930-023-10121-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/25/2023] [Indexed: 05/19/2023]
Abstract
Due to the importance of protein-protein interactions in defence mechanism of living body, attempts were made to investigate its attributes, including, but not limited to, binding affinity, and binding region. Contemporary strategies for binding site prediction largely resort to deep learning techniques but turned out to be low precision models. As laboratory experiments for drug discovery tasks utilize this information, increased false positives devalue the computational methods. This emphasize the need to develop enhanced strategies. DeepBindPPI employs deep learning technique to predict the binding regions of proteins, particularly antigen-antibody interaction sites. The results obtained are applied in a docking environment to confirm their correctness. An integration of graph convolutional network with attention mechanism predicts interacting amino acids with improved precision. The model learns the determining factors in interaction from a general pool of proteins and is then fine-tuned using antigen-antibody data. Comparison of the proposed method with existing techniques shows that the developed model has comparable performance. The use of a separate spatial network clearly improved the precision of the proposed method from 0.4 to 0.5. An attempt to utilize the interface information for docking using the HDOCK server gives promising results, with high-quality structures appearing in the top10 ranks.
Collapse
Affiliation(s)
- Sharon Sunny
- Department of CSE, National Institute of Technology, Calicut, Kerala 673601 India
| | | | - G. Gopakumar
- Department of CSE, National Institute of Technology, Calicut, Kerala 673601 India
| | - P. B. Jayaraj
- Department of CSE, National Institute of Technology, Calicut, Kerala 673601 India
| |
Collapse
|
9
|
Williams NP, Rodrigues CHM, Truong J, Ascher DB, Holien JK. DockNet: high-throughput protein-protein interface contact prediction. Bioinformatics 2022; 39:6885444. [PMID: 36484688 PMCID: PMC9825772 DOI: 10.1093/bioinformatics/btac797] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2022] [Revised: 10/27/2022] [Accepted: 12/08/2022] [Indexed: 12/13/2022] Open
Abstract
MOTIVATION Over 300 000 protein-protein interaction (PPI) pairs have been identified in the human proteome and targeting these is fast becoming the next frontier in drug design. Predicting PPI sites, however, is a challenging task that traditionally requires computationally expensive and time-consuming docking simulations. A major weakness of modern protein docking algorithms is the inability to account for protein flexibility, which ultimately leads to relatively poor results. RESULTS Here, we propose DockNet, an efficient Siamese graph-based neural network method which predicts contact residues between two interacting proteins. Unlike other methods that only utilize a protein's surface or treat the protein structure as a rigid body, DockNet incorporates the entire protein structure and places no limits on protein flexibility during an interaction. Predictions are modeled at the residue level, based on a diverse set of input node features including residue type, surface accessibility, residue depth, secondary structure, pharmacophore and torsional angles. DockNet is comparable to current state-of-the-art methods, achieving an area under the curve (AUC) value of up to 0.84 on an independent test set (DB5), can be applied to a variety of different protein structures and can be utilized in situations where accurate unbound protein structures cannot be obtained. AVAILABILITY AND IMPLEMENTATION DockNet is available at https://github.com/npwilliams09/docknet and an easy-to-use webserver at https://biosig.lab.uq.edu.au/docknet. All other data underlying this article are available in the article and in its online supplementary material. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Jia Truong
- STEM College, RMIT University, Melbourne, VIC, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, VIC, Australia,School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD, Australia
| | | |
Collapse
|
10
|
Long L, Wei J, Lim SA, Raynor JL, Shi H, Connelly JP, Wang H, Guy C, Xie B, Chapman NM, Fu G, Wang Y, Huang H, Su W, Saravia J, Risch I, Wang YD, Li Y, Niu M, Dhungana Y, Kc A, Zhou P, Vogel P, Yu J, Pruett-Miller SM, Peng J, Chi H. CRISPR screens unveil signal hubs for nutrient licensing of T cell immunity. Nature 2021; 600:308-313. [PMID: 34795452 PMCID: PMC8887674 DOI: 10.1038/s41586-021-04109-7] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 10/07/2021] [Indexed: 12/26/2022]
Abstract
Nutrients are emerging regulators of adaptive immunity1. Selective nutrients interplay with immunological signals to activate mechanistic target of rapamycin complex 1 (mTORC1), a key driver of cell metabolism2-4, but how these environmental signals are integrated for immune regulation remains unclear. Here we use genome-wide CRISPR screening combined with protein-protein interaction networks to identify regulatory modules that mediate immune receptor- and nutrient-dependent signalling to mTORC1 in mouse regulatory T (Treg) cells. SEC31A is identified to promote mTORC1 activation by interacting with the GATOR2 component SEC13 to protect it from SKP1-dependent proteasomal degradation. Accordingly, loss of SEC31A impairs T cell priming and Treg suppressive function in mice. In addition, the SWI/SNF complex restricts expression of the amino acid sensor CASTOR1, thereby enhancing mTORC1 activation. Moreover, we reveal that the CCDC101-associated SAGA complex is a potent inhibitor of mTORC1, which limits the expression of glucose and amino acid transporters and maintains T cell quiescence in vivo. Specific deletion of Ccdc101 in mouse Treg cells results in uncontrolled inflammation but improved antitumour immunity. Collectively, our results establish epigenetic and post-translational mechanisms that underpin how nutrient transporters, sensors and transducers interplay with immune signals for three-tiered regulation of mTORC1 activity and identify their pivotal roles in licensing T cell immunity and immune tolerance.
Collapse
Affiliation(s)
- Lingyun Long
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Jun Wei
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Seon Ah Lim
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Jana L Raynor
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Hao Shi
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Jon P Connelly
- Center for Advanced Genome Engineering, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Hong Wang
- Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Cliff Guy
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Boer Xie
- Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Nicole M Chapman
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Guotong Fu
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Yanyan Wang
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Hongling Huang
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Wei Su
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Jordy Saravia
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Isabel Risch
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Yong-Dong Wang
- Department of Cell and Molecular Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Yuxin Li
- Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Mingming Niu
- Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Yogesh Dhungana
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Anil Kc
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Peipei Zhou
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Peter Vogel
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Jiyang Yu
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Shondra M Pruett-Miller
- Center for Advanced Genome Engineering, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Junmin Peng
- Center for Proteomics and Metabolomics, St. Jude Children's Research Hospital, Memphis, TN, USA
- Department of Structural Biology, St. Jude Children's Research Hospital, Memphis, TN, USA
- Department of Developmental Neurobiology, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Hongbo Chi
- Department of Immunology, St. Jude Children's Research Hospital, Memphis, TN, USA.
| |
Collapse
|
11
|
Abstract
The biological significance of proteins attracted the scientific community in exploring their characteristics. The studies shed light on the interaction patterns and functions of proteins in a living body. Due to their practical difficulties, reliable experimental techniques pave the way for introducing computational methods in the interaction prediction. Automated methods reduced the difficulties but could not yet replace experimental studies as the field is still evolving. Interaction prediction problem being critical needs highly accurate results, but none of the existing methods could offer reliable performance that can parallel with experimental results yet. This article aims to assess the existing computational docking algorithms, their challenges, and future scope. Blind docking techniques are quite helpful when no information other than the individual structures are available. As more and more complex structures are being added to different databases, information-driven approaches can be a good alternative. Artificial intelligence, ruling over the major fields, is expected to take over this domain very shortly.
Collapse
|
12
|
Quignot C, Postic G, Bret H, Rey J, Granger P, Murail S, Chacón P, Andreani J, Tufféry P, Guerois R. InterEvDock3: a combined template-based and free docking server with increased performance through explicit modeling of complex homologs and integration of covariation-based contact maps. Nucleic Acids Res 2021; 49:W277-W284. [PMID: 33978743 PMCID: PMC8265070 DOI: 10.1093/nar/gkab358] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2021] [Revised: 04/09/2021] [Accepted: 04/23/2021] [Indexed: 12/19/2022] Open
Abstract
The InterEvDock3 protein docking server exploits the constraints of evolution by multiple means to generate structural models of protein assemblies. The server takes as input either several sequences or 3D structures of proteins known to interact. It returns a set of 10 consensus candidate complexes, together with interface predictions to guide further experimental validation interactively. Three key novelties were implemented in InterEvDock3 to help obtain more reliable models: users can (i) generate template-based structural models of assemblies using close and remote homologs of known 3D structure, detected through an automated search protocol, (ii) select the assembly models most consistent with contact maps from external methods that implement covariation-based contact prediction with or without deep learning and (iii) exploit a novel coevolution-based scoring scheme at atomic level, which leads to significantly higher free docking success rates. The performance of the server was validated on two large free docking benchmark databases, containing respectively 230 unbound targets (Weng dataset) and 812 models of unbound targets (PPI4DOCK dataset). Its effectiveness has also been proven on a number of challenging examples. The InterEvDock3 web interface is available at http://bioserv.rpbs.univ-paris-diderot.fr/services/InterEvDock3/.
Collapse
Affiliation(s)
- Chloé Quignot
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Guillaume Postic
- Université de Paris, CNRS UMR 8251, INSERM U1133, RPBS, Paris 75205, France
| | - Hélène Bret
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Julien Rey
- Université de Paris, CNRS UMR 8251, INSERM U1133, RPBS, Paris 75205, France
| | - Pierre Granger
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Samuel Murail
- Université de Paris, CNRS UMR 8251, INSERM U1133, RPBS, Paris 75205, France
| | - Pablo Chacón
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry C.S.I.C, Madrid, Spain
| | - Jessica Andreani
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| | - Pierre Tufféry
- Université de Paris, CNRS UMR 8251, INSERM U1133, RPBS, Paris 75205, France
| | - Raphaël Guerois
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198 Gif-sur-Yvette, France
| |
Collapse
|
13
|
Pal A, Pal D, Mitra P. A computational framework for modeling functional protein-protein interactions. Proteins 2021; 89:1353-1364. [PMID: 34076296 DOI: 10.1002/prot.26156] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Revised: 04/17/2021] [Accepted: 05/19/2021] [Indexed: 11/06/2022]
Abstract
Protein interactions and their assemblies assist in understanding the cellular mechanisms through the knowledge of interactome. Despite recent advances, a vast number of interacting protein complexes is not annotated by three-dimensional structures. Therefore, a computational framework is a suitable alternative to fill the large gap between identified interactions and the interactions with known structures. In this work, we develop an automated computational framework for modeling functionally related protein-complex structures utilizing GO-based semantic similarity technique and co-evolutionary information of the interaction sites. The framework can consider protein sequence and structure information as input and employ both rigid-body docking and template-based modeling exploiting the existing structural templates and sequence homology information from the PDB. Our framework combines geometric as well as physicochemical features for re-ranking the docking decoys. The proposed framework has an 83% success rate when tested on a benchmark dataset while considering Top1 models for template-based modeling and Top10 models for the docking pipeline. We believe that our computational framework can be used for any pair of proteins with higher confidence to identify the functional protein-protein interactions.
Collapse
Affiliation(s)
- Abantika Pal
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India
| | - Debnath Pal
- Department of Computational and Data Sciences, Indian Institute of Science Bangalore, Bangalore, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, India
| |
Collapse
|
14
|
Quignot C, Granger P, Chacón P, Guerois R, Andreani J. Atomic-level evolutionary information improves protein-protein interface scoring. Bioinformatics 2021; 37:3175-3181. [PMID: 33901284 DOI: 10.1093/bioinformatics/btab254] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 03/20/2021] [Accepted: 04/19/2021] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION The crucial role of protein interactions and the difficulty in characterising them experimentally strongly motivates the development of computational approaches for structural prediction. Even when protein-protein docking samples correct models, current scoring functions struggle to discriminate them from incorrect decoys. The previous incorporation of conservation and coevolution information has shown promise for improving protein-protein scoring. Here, we present a novel strategy to integrate atomic-level evolutionary information into different types of scoring functions to improve their docking discrimination. RESULTS : We applied this general strategy to our residue-level statistical potential from InterEvScore and to two atomic-level scores, SOAP-PP and Rosetta interface score (ISC). Including evolutionary information from as few as ten homologous sequences improves the top 10 success rates of individual atomic-level scores SOAP-PP and Rosetta ISC by respectively 6 and 13.5 percentage points, on a large benchmark of 752 docking cases. The best individual homology-enriched score reaches a top 10 success rate of 34.4%. A consensus approach based on the complementarity between different homology-enriched scores further increases the top 10 success rate to 40%. AVAILABILITY All data used for benchmarking and scoring results, as well as a Singularity container of the pipeline, are available at http://biodev.cea.fr/interevol/interevdata/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chloé Quignot
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Pierre Granger
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Pablo Chacón
- Department of Biological Chemical Physics, Rocasolano Institute of Physical Chemistry C.S.I.C, Madrid, Spain
| | - Raphael Guerois
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| | - Jessica Andreani
- Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
| |
Collapse
|
15
|
Eismann S, Townshend RJL, Thomas N, Jagota M, Jing B, Dror RO. Hierarchical, rotation-equivariant neural networks to select structural models of protein complexes. Proteins 2020; 89:493-501. [PMID: 33289162 DOI: 10.1002/prot.26033] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 10/10/2020] [Accepted: 11/21/2020] [Indexed: 12/16/2022]
Abstract
Predicting the structure of multi-protein complexes is a grand challenge in biochemistry, with major implications for basic science and drug discovery. Computational structure prediction methods generally leverage predefined structural features to distinguish accurate structural models from less accurate ones. This raises the question of whether it is possible to learn characteristics of accurate models directly from atomic coordinates of protein complexes, with no prior assumptions. Here we introduce a machine learning method that learns directly from the 3D positions of all atoms to identify accurate models of protein complexes, without using any precomputed physics-inspired or statistical terms. Our neural network architecture combines multiple ingredients that together enable end-to-end learning from molecular structures containing tens of thousands of atoms: a point-based representation of atoms, equivariance with respect to rotation and translation, local convolutions, and hierarchical subsampling operations. When used in combination with previously developed scoring functions, our network substantially improves the identification of accurate structural models among a large set of possible models. Our network can also be used to predict the accuracy of a given structural model in absolute terms. The architecture we present is readily applicable to other tasks involving learning on 3D structures of large atomic systems.
Collapse
Affiliation(s)
- Stephan Eismann
- Department of Applied Physics, Stanford University, Stanford, California, USA.,Department of Computer Science, Stanford University, Stanford, California, USA
| | | | - Nathaniel Thomas
- Department of Physics, Stanford University, Stanford, California, USA
| | - Milind Jagota
- Department of Computer Science, Stanford University, Stanford, California, USA.,Department of Electrical Engineering, Stanford University, Stanford, California, USA
| | - Bowen Jing
- Department of Computer Science, Stanford University, Stanford, California, USA
| | - Ron O Dror
- Department of Computer Science, Stanford University, Stanford, California, USA.,Department of Structural Biology, Stanford University, Stanford, California, USA.,Department of Molecular and Cellular Physiology, Stanford University, Stanford, California, USA.,Institute for Computational and Mathematical Engineering, Stanford University, Stanford, California, USA
| |
Collapse
|
16
|
Integrative modeling of membrane-associated protein assemblies. Nat Commun 2020; 11:6210. [PMID: 33277503 PMCID: PMC7718903 DOI: 10.1038/s41467-020-20076-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Accepted: 11/13/2020] [Indexed: 01/03/2023] Open
Abstract
Membrane proteins are among the most challenging systems to study with experimental structural biology techniques. The increased number of deposited structures of membrane proteins has opened the route to modeling their complexes by methods such as docking. Here, we present an integrative computational protocol for the modeling of membrane-associated protein assemblies. The information encoded by the membrane is represented by artificial beads, which allow targeting of the docking toward the binding-competent regions. It combines efficient, artificial intelligence-based rigid-body docking by LightDock with a flexible final refinement with HADDOCK to remove potential clashes at the interface. We demonstrate the performance of this protocol on eighteen membrane-associated complexes, whose interface lies between the membrane and either the cytosolic or periplasmic regions. In addition, we provide a comparison to another state-of-the-art docking software, ZDOCK. This protocol should shed light on the still dark fraction of the interactome consisting of membrane proteins.
Collapse
|
17
|
Andreani J, Quignot C, Guerois R. Structural prediction of protein interactions and docking using conservation and coevolution. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1470] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Jessica Andreani
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Chloé Quignot
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Raphael Guerois
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| |
Collapse
|
18
|
Graves J, Byerly J, Priego E, Makkapati N, Parish SV, Medellin B, Berrondo M. A Review of Deep Learning Methods for Antibodies. Antibodies (Basel) 2020; 9:E12. [PMID: 32354020 PMCID: PMC7344881 DOI: 10.3390/antib9020012] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2020] [Revised: 04/15/2020] [Accepted: 04/16/2020] [Indexed: 01/09/2023] Open
Abstract
Driven by its successes across domains such as computer vision and natural language processing, deep learning has recently entered the field of biology by aiding in cellular image classification, finding genomic connections, and advancing drug discovery. In drug discovery and protein engineering, a major goal is to design a molecule that will perform a useful function as a therapeutic drug. Typically, the focus has been on small molecules, but new approaches have been developed to apply these same principles of deep learning to biologics, such as antibodies. Here we give a brief background of deep learning as it applies to antibody drug development, and an in-depth explanation of several deep learning algorithms that have been proposed to solve aspects of both protein design in general, and antibody design in particular.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Monica Berrondo
- Macromoltek, Inc, 2500 W William Cannon Dr, Suite 204, Austin, Austin, TX 78745, USA
| |
Collapse
|
19
|
Quignot C, Rey J, Yu J, Tufféry P, Guerois R, Andreani J. InterEvDock2: an expanded server for protein docking using evolutionary and biological information from homology models and multimeric inputs. Nucleic Acids Res 2019; 46:W408-W416. [PMID: 29741647 PMCID: PMC6030979 DOI: 10.1093/nar/gky377] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 05/02/2018] [Indexed: 12/15/2022] Open
Abstract
Computational protein docking is a powerful strategy to predict structures of protein-protein interactions and provides crucial insights for the functional characterization of macromolecular cross-talks. We previously developed InterEvDock, a server for ab initio protein docking based on rigid-body sampling followed by consensus scoring using physics-based and statistical potentials, including the InterEvScore function specifically developed to incorporate co-evolutionary information in docking. InterEvDock2 is a major evolution of InterEvDock which allows users to submit input sequences – not only structures – and multimeric inputs and to specify constraints for the pairwise docking process based on previous knowledge about the interaction. For this purpose, we added modules in InterEvDock2 for automatic template search and comparative modeling of the input proteins. The InterEvDock2 pipeline was benchmarked on 812 complexes for which unbound homology models of the two partners and co-evolutionary information are available in the PPI4DOCK database. InterEvDock2 identified a correct model among the top 10 consensus in 29% of these cases (compared to 15–24% for individual scoring functions) and at least one correct interface residue among 10 predicted in 91% of these cases. InterEvDock2 is thus a unique protein docking server, designed to be useful for the experimental biology community. The InterEvDock2 web interface is available at http://bioserv.rpbs.univ-paris-diderot.fr/services/InterEvDock2/.
Collapse
Affiliation(s)
- Chloé Quignot
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette cedex, France
| | - Julien Rey
- INSERM UMR-S 973, Université Paris Diderot, Sorbonne Paris Cité, RPBS, Paris 75205, France
| | - Jinchao Yu
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette cedex, France
| | - Pierre Tufféry
- INSERM UMR-S 973, Université Paris Diderot, Sorbonne Paris Cité, RPBS, Paris 75205, France
| | - Raphaël Guerois
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette cedex, France
| | - Jessica Andreani
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette cedex, France
| |
Collapse
|
20
|
Dapkūnas J, Olechnovič K, Venclovas Č. Structural modeling of protein complexes: Current capabilities and challenges. Proteins 2019; 87:1222-1232. [PMID: 31294859 DOI: 10.1002/prot.25774] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2019] [Revised: 06/21/2019] [Accepted: 07/06/2019] [Indexed: 12/27/2022]
Abstract
Proteins frequently interact with each other, and the knowledge of structures of the corresponding protein complexes is necessary to understand how they function. Computational methods are increasingly used to provide structural models of protein complexes. Not surprisingly, community-wide Critical Assessment of protein Structure Prediction (CASP) experiments have recently started monitoring the progress in this research area. We participated in CASP13 with the aim to evaluate our current capabilities in modeling of protein complexes and to gain a better understanding of factors that exert the largest impact on these capabilities. To model protein complexes in CASP13, we applied template-based modeling, free docking and hybrid techniques that enabled us to generate models of the topmost quality for 27 of 42 multimers. If templates for protein complexes could be identified, we modeled the structures with reasonable accuracy by straightforward homology modeling. If only partial templates were available, it was nevertheless possible to predict the interaction interfaces correctly or to generate acceptable models for protein complexes by combining template-based modeling with docking. If no templates were available, we used rigid-body docking with limited success. However, in some free docking models, despite the incorrect subunit orientation and missed interface contacts, the approximate location of protein binding sites was identified correctly. Apparently, our overall performance in docking was limited by the quality of monomer models and by the imperfection of scoring methods. The impact of human intervention on our results in modeling of protein complexes was significant indicating the need for improvements of automatic methods.
Collapse
Affiliation(s)
- Justas Dapkūnas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| |
Collapse
|
21
|
Computational approaches to macromolecular interactions in the cell. Curr Opin Struct Biol 2019; 55:59-65. [PMID: 30999240 DOI: 10.1016/j.sbi.2019.03.012] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2018] [Accepted: 03/08/2019] [Indexed: 12/15/2022]
Abstract
Structural modeling of a cell is an evolving strategic direction in computational structural biology. It takes advantage of new powerful modeling techniques, deeper understanding of fundamental principles of molecular structure and assembly, and rapid growth of the amount of structural data generated by experimental techniques. Key modeling approaches to principal types of macromolecular assemblies in a cell already exist. The main challenge, along with the further development of these modeling approaches, is putting them together in a consistent, unified whole cell model. This opinion piece addresses the fundamental aspects of modeling macromolecular assemblies in a cell, and the state-of-the-art in modeling of the principal types of such assemblies.
Collapse
|
22
|
Integrative analysis of pooled CRISPR genetic screens using MAGeCKFlute. Nat Protoc 2019; 14:756-780. [PMID: 30710114 PMCID: PMC6862721 DOI: 10.1038/s41596-018-0113-7] [Citation(s) in RCA: 287] [Impact Index Per Article: 47.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2018] [Accepted: 12/07/2018] [Indexed: 12/25/2022]
Abstract
Genome-wide screening using CRISPR coupled with nuclease Cas9 (CRISPR-Cas9) is a powerful technology for the systematic evaluation of gene function. Statistically principled analysis is needed for the accurate identification of gene hits and associated pathways. Here, we describe how to perform computational analysis of CRISPR screens using the MAGeCKFlute pipeline. MAGeCKFlute combines the MAGeCK and MAGeCK-VISPR algorithms and incorporates additional downstream analysis functionalities. MAGeCKFlute is distinguished from other currently available tools by its comprehensive pipeline, which contains a series of functions for analyzing CRISPR screen data. This protocol explains how to use MAGeCKFlute to perform quality control (QC), normalization, batch effect removal, copy-number bias correction, gene hit identification and downstream functional enrichment analysis for CRISPR screens. We also describe gene identification and data analysis in CRISPR screens involving drug treatment. Completing the entire MAGeCKFlute pipeline requires ~3 h on a desktop computer running Linux or Mac OS with R support.
Collapse
|
23
|
McFarland JM, Ho ZV, Kugener G, Dempster JM, Montgomery PG, Bryan JG, Krill-Burger JM, Green TM, Vazquez F, Boehm JS, Golub TR, Hahn WC, Root DE, Tsherniak A. Improved estimation of cancer dependencies from large-scale RNAi screens using model-based normalization and data integration. Nat Commun 2018. [PMID: 30389920 DOI: 10.6084/m9.figshare.6025238.v6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/10/2023] Open
Abstract
The availability of multiple datasets comprising genome-scale RNAi viability screens in hundreds of diverse cancer cell lines presents new opportunities for understanding cancer vulnerabilities. Integrated analyses of these data to assess differential dependency across genes and cell lines are challenging due to confounding factors such as batch effects and variable screen quality, as well as difficulty assessing gene dependency on an absolute scale. To address these issues, we incorporated cell line screen-quality parameters and hierarchical Bayesian inference into DEMETER2, an analytical framework for analyzing RNAi screens ( https://depmap.org/R2-D2 ). This model substantially improves estimates of gene dependency across a range of performance measures, including identification of gold-standard essential genes and agreement with CRISPR/Cas9-based viability screens. It also allows us to integrate information across three large RNAi screening datasets, providing a unified resource representing the most extensive compilation of cancer cell line genetic dependencies to date.
Collapse
Affiliation(s)
| | - Zandra V Ho
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
| | | | | | | | - Jordan G Bryan
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
| | | | - Thomas M Green
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
| | - Francisca Vazquez
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
- Dana-Farber Cancer Institute, Boston, 02215, MA, USA
| | - Jesse S Boehm
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
| | - Todd R Golub
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
- Dana-Farber Cancer Institute, Boston, 02215, MA, USA
- Harvard Medical School, Boston, 02115, MA, USA
- Boston Children's Hospital, Boston, 02115, MA, USA
- Howard Hughes Medical Institute, Chevy Chase, 20815, MD, USA
| | - William C Hahn
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
- Dana-Farber Cancer Institute, Boston, 02215, MA, USA
- Harvard Medical School, Boston, 02115, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Boston, 02115, MA, USA
| | - David E Root
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
| | - Aviad Tsherniak
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA.
| |
Collapse
|
24
|
McFarland JM, Ho ZV, Kugener G, Dempster JM, Montgomery PG, Bryan JG, Krill-Burger JM, Green TM, Vazquez F, Boehm JS, Golub TR, Hahn WC, Root DE, Tsherniak A. Improved estimation of cancer dependencies from large-scale RNAi screens using model-based normalization and data integration. Nat Commun 2018; 9:4610. [PMID: 30389920 PMCID: PMC6214982 DOI: 10.1038/s41467-018-06916-5] [Citation(s) in RCA: 252] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Accepted: 10/01/2018] [Indexed: 01/03/2023] Open
Abstract
The availability of multiple datasets comprising genome-scale RNAi viability screens in hundreds of diverse cancer cell lines presents new opportunities for understanding cancer vulnerabilities. Integrated analyses of these data to assess differential dependency across genes and cell lines are challenging due to confounding factors such as batch effects and variable screen quality, as well as difficulty assessing gene dependency on an absolute scale. To address these issues, we incorporated cell line screen-quality parameters and hierarchical Bayesian inference into DEMETER2, an analytical framework for analyzing RNAi screens (https://depmap.org/R2-D2). This model substantially improves estimates of gene dependency across a range of performance measures, including identification of gold-standard essential genes and agreement with CRISPR/Cas9-based viability screens. It also allows us to integrate information across three large RNAi screening datasets, providing a unified resource representing the most extensive compilation of cancer cell line genetic dependencies to date. Integrated analyses of multiple large-scale screenings can be complicated by batch effects and technical artefacts. McFarland et al. introduce DEMETER2, a hierarchical model coupled with model-based normalization, which allows the assessment of differential dependencies across genes and cell lines.
Collapse
Affiliation(s)
| | - Zandra V Ho
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
| | | | | | | | - Jordan G Bryan
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
| | | | - Thomas M Green
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
| | - Francisca Vazquez
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA.,Dana-Farber Cancer Institute, Boston, 02215, MA, USA
| | - Jesse S Boehm
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
| | - Todd R Golub
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA.,Dana-Farber Cancer Institute, Boston, 02215, MA, USA.,Harvard Medical School, Boston, 02115, MA, USA.,Boston Children's Hospital, Boston, 02115, MA, USA.,Howard Hughes Medical Institute, Chevy Chase, 20815, MD, USA
| | - William C Hahn
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA.,Dana-Farber Cancer Institute, Boston, 02215, MA, USA.,Harvard Medical School, Boston, 02115, MA, USA.,Department of Medicine, Brigham and Women's Hospital, Boston, 02115, MA, USA
| | - David E Root
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA
| | - Aviad Tsherniak
- Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA.
| |
Collapse
|