1
|
Kim S. Maximum feasibility estimation. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.04.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
2
|
|
3
|
Abstract
Motivation Multistate protein design addresses real-world challenges, such as multi-specificity design and backbone flexibility, by considering both positive and negative protein states with an ensemble of substates for each. It also presents an enormous challenge to exact algorithms that guarantee the optimal solutions and enable a direct test of mechanistic hypotheses behind models. However, efficient exact algorithms are lacking for multistate protein design. Results We have developed an efficient exact algorithm called interconnected cost function networks (iCFN) for multistate protein design. Its generic formulation allows for a wide array of applications such as stability, affinity and specificity designs while addressing concerns such as global flexibility of protein backbones. iCFN treats each substate design as a weighted constraint satisfaction problem (WCSP) modeled through a CFN; and it solves the coupled WCSPs using novel bounds and a depth-first branch-and-bound search over a tree structure of sequences, substates, and conformations. When iCFN is applied to specificity design of a T-cell receptor, a problem of unprecedented size to exact methods, it drastically reduces search space and running time to make the problem tractable. Moreover, iCFN generates experimentally-agreeing receptor designs with improved accuracy compared with state-of-the-art methods, highlights the importance of modeling backbone flexibility in protein design, and reveals molecular mechanisms underlying binding specificity. Availability and implementation https://shen-lab.github.io/software/iCFN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mostafa Karimi
- Department of Electrical and Computer Engineering and TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, USA
| | - Yang Shen
- Department of Electrical and Computer Engineering and TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, USA
| |
Collapse
|
4
|
Traoré S, Allouche D, André I, Schiex T, Barbe S. Deterministic Search Methods for Computational Protein Design. Methods Mol Biol 2017; 1529:107-123. [PMID: 27914047 DOI: 10.1007/978-1-4939-6637-0_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
One main challenge in Computational Protein Design (CPD) lies in the exploration of the amino-acid sequence space, while considering, to some extent, side chain flexibility. The exorbitant size of the search space urges for the development of efficient exact deterministic search methods enabling identification of low-energy sequence-conformation models, corresponding either to the global minimum energy conformation (GMEC) or an ensemble of guaranteed near-optimal solutions. In contrast to stochastic local search methods that are not guaranteed to find the GMEC, exact deterministic approaches always identify the GMEC and prove its optimality in finite but exponential worst-case time. After a brief overview on these two classes of methods, we discuss the grounds and merits of four deterministic methods that have been applied to solve CPD problems. These approaches are based either on the Dead-End-Elimination theorem combined with A* algorithm (DEE/A*), on Cost Function Networks algorithms (CFN), on Integer Linear Programming solvers (ILP) or on Markov Random Fields solvers (MRF). The way two of these methods (DEE/A* and CFN) can be used in practice to identify low-energy sequence-conformation models starting from a pairwise decomposed energy matrix is detailed in this review.
Collapse
Affiliation(s)
- Seydou Traoré
- INSA, UPS, INP, Université de Toulouse, 135 Avenue de Rangueil, 31077, Toulouse, France
- Laboratoire d'Ingénierie Ingénierie des Systèmes Biologiques et des Procédés - INSA, INRA, UMR792, 31400, Toulouse, France
- CNRS, UMR5504, 31400, Toulouse, France
| | - David Allouche
- Unité de Mathématiques et Informatique de Toulouse, UR 875, INRA, 31320, Castanet Tolosan, France
| | - Isabelle André
- INSA, UPS, INP, Université de Toulouse, 135 Avenue de Rangueil, 31077, Toulouse, France
- Laboratoire d'Ingénierie Ingénierie des Systèmes Biologiques et des Procédés - INSA, INRA, UMR792, 31400, Toulouse, France
- CNRS, UMR5504, 31400, Toulouse, France
| | - Thomas Schiex
- Unité de Mathématiques et Informatique de Toulouse, UR 875, INRA, 31320, Castanet Tolosan, France
| | - Sophie Barbe
- INSA, UPS, INP, Université de Toulouse, 135 Avenue de Rangueil, 31077, Toulouse, France.
- Laboratoire d'Ingénierie Ingénierie des Systèmes Biologiques et des Procédés - INSA, INRA, UMR792, 31400, Toulouse, France.
- CNRS, UMR5504, 31400, Toulouse, France.
| |
Collapse
|
5
|
Allouche D, Bessiere C, Boizumault P, de Givry S, Gutierrez P, Lee JH, Leung KL, Loudni S, Métivier JP, Schiex T, Wu Y. Tractability-preserving transformations of global cost functions. ARTIF INTELL 2016. [DOI: 10.1016/j.artint.2016.06.005] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
6
|
Traoré S, Roberts KE, Allouche D, Donald BR, André I, Schiex T, Barbe S. Fast search algorithms for computational protein design. J Comput Chem 2016; 37:1048-58. [PMID: 26833706 PMCID: PMC4828276 DOI: 10.1002/jcc.24290] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2015] [Revised: 09/23/2015] [Accepted: 11/27/2015] [Indexed: 12/12/2022]
Abstract
One of the main challenges in computational protein design (CPD) is the huge size of the protein sequence and conformational space that has to be computationally explored. Recently, we showed that state-of-the-art combinatorial optimization technologies based on Cost Function Network (CFN) processing allow speeding up provable rigid backbone protein design methods by several orders of magnitudes. Building up on this, we improved and injected CFN technology into the well-established CPD package Osprey to allow all Osprey CPD algorithms to benefit from associated speedups. Because Osprey fundamentally relies on the ability of A* to produce conformations in increasing order of energy, we defined new A* strategies combining CFN lower bounds, with new side-chain positioning-based branching scheme. Beyond the speedups obtained in the new A*-CFN combination, this novel branching scheme enables a much faster enumeration of suboptimal sequences, far beyond what is reachable without it. Together with the immediate and important speedups provided by CFN technology, these developments directly benefit to all the algorithms that previously relied on the DEE/ A* combination inside Osprey* and make it possible to solve larger CPD problems with provable algorithms.
Collapse
Affiliation(s)
- Seydou Traoré
- Université de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, F-31077 Toulouse, France
- INRA, UMR792, Ingénierie des Systèmes Biologiques et des Procédés, F-31400 Toulouse, France
- CNRS, UMR5504, F-31400 Toulouse, France
| | - Kyle E. Roberts
- Department of Biochemistry, Department of Computer Science, Department of Chemistry, Duke University, Durham, NC, USA
| | - David Allouche
- Unité de Mathématiques et Informatique Appliquées de Toulouse, UR 875, INRA, F-31320 Castanet Tolosan, France
| | - Bruce R. Donald
- Department of Biochemistry, Department of Computer Science, Department of Chemistry, Duke University, Durham, NC, USA
| | - Isabelle André
- Université de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, F-31077 Toulouse, France
- INRA, UMR792, Ingénierie des Systèmes Biologiques et des Procédés, F-31400 Toulouse, France
- CNRS, UMR5504, F-31400 Toulouse, France
| | - Thomas Schiex
- Unité de Mathématiques et Informatique Appliquées de Toulouse, UR 875, INRA, F-31320 Castanet Tolosan, France
| | - Sophie Barbe
- Université de Toulouse; INSA, UPS, INP; LISBP, 135 Avenue de Rangueil, F-31077 Toulouse, France
- INRA, UMR792, Ingénierie des Systèmes Biologiques et des Procédés, F-31400 Toulouse, France
- CNRS, UMR5504, F-31400 Toulouse, France
| |
Collapse
|
7
|
Roberts KE, Gainza P, Hallen MA, Donald BR. Fast gap-free enumeration of conformations and sequences for protein design. Proteins 2015; 83:1859-1877. [PMID: 26235965 DOI: 10.1002/prot.24870] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2015] [Revised: 07/14/2015] [Accepted: 07/21/2015] [Indexed: 12/12/2022]
Abstract
Despite significant successes in structure-based computational protein design in recent years, protein design algorithms must be improved to increase the biological accuracy of new designs. Protein design algorithms search through an exponential number of protein conformations, protein ensembles, and amino acid sequences in an attempt to find globally optimal structures with a desired biological function. To improve the biological accuracy of protein designs, it is necessary to increase both the amount of protein flexibility allowed during the search and the overall size of the design, while guaranteeing that the lowest-energy structures and sequences are found. DEE/A*-based algorithms are the most prevalent provable algorithms in the field of protein design and can provably enumerate a gap-free list of low-energy protein conformations, which is necessary for ensemble-based algorithms that predict protein binding. We present two classes of algorithmic improvements to the A* algorithm that greatly increase the efficiency of A*. First, we analyze the effect of ordering the expansion of mutable residue positions within the A* tree and present a dynamic residue ordering that reduces the number of A* nodes that must be visited during the search. Second, we propose new methods to improve the conformational bounds used to estimate the energies of partial conformations during the A* search. The residue ordering techniques and improved bounds can be combined for additional increases in A* efficiency. Our enhancements enable all A*-based methods to more fully search protein conformation space, which will ultimately improve the accuracy of complex biomedically relevant designs.
Collapse
Affiliation(s)
- Kyle E Roberts
- Department of Computer Science, Duke University, Durham, NC
| | - Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC
| | - Mark A Hallen
- Department of Computer Science, Duke University, Durham, NC
| | - Bruce R Donald
- Department of Computer Science, Duke University, Durham, NC.,Department of Biochemistry, Duke University Medical Center, Durham, NC.,Department of Chemistry, Duke University, Durham, NC
| |
Collapse
|
8
|
Soto R, Crawford B, Palma W, Galleguillos K, Castro C, Monfroy E, Johnson F, Paredes F. Boosting autonomous search for CSPs via skylines. Inf Sci (N Y) 2015. [DOI: 10.1016/j.ins.2015.01.035] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
9
|
Allouche D, André I, Barbe S, Davies J, de Givry S, Katsirelos G, O'Sullivan B, Prestwich S, Schiex T, Traoré S. Computational protein design as an optimization problem. ARTIF INTELL 2014. [DOI: 10.1016/j.artint.2014.03.005] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
10
|
Climent L, Wallace RJ, Salido MA, Barber F. Finding robust solutions for constraint satisfaction problems with discrete and ordered domains by coverings. Artif Intell Rev 2013. [DOI: 10.1007/s10462-013-9420-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
11
|
Traoré S, Allouche D, André I, de Givry S, Katsirelos G, Schiex T, Barbe S. A new framework for computational protein design through cost function network optimization. Bioinformatics 2013; 29:2129-36. [DOI: 10.1093/bioinformatics/btt374] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
12
|
HELAOUI MAHER, NAANAA WADY, AYEB BECHIR. SUBMODULARITY-BASED DECOMPOSING FOR VALUED CSP. INT J ARTIF INTELL T 2013. [DOI: 10.1142/s0218213013500061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Many combinatorial problems can be formulated as Valued Constraint Satisfaction Problems (VCSPs). In this framework, the constraints are defined by means of valuation functions to reflect several degrees of coherence. Despite the NP-hardness of the VCSP, tractable versions can be obtained by forcing the allowable valuation functions to have specific features. This is the case for submodular VCSPs, i.e. VCSPs that involve submodular valuation functions only. In this paper, we propose a problem decomposition scheme for binary VCSPs that takes advantage of submodular functions even when the studied problem is not submodular. The proposed scheme consists in decomposing the problem to be solved into a set of submodular, then tractable, subproblems. The decomposition scheme combines two techniques that where already used in the framework of constraint-based reasoning, but in separate manner. These techniques are domain partitioning and value permutation.
Collapse
Affiliation(s)
| | - WADY NAANAA
- Faculty of Sciences, University of Monastir, Tunisia
| | - BECHIR AYEB
- Faculty of Sciences, University of Monastir, Tunisia
| |
Collapse
|
13
|
Ansótegui C, Bonet ML, Levy J, Manyà F. Resolution procedures for multiple-valued optimization. Inf Sci (N Y) 2013. [DOI: 10.1016/j.ins.2012.12.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
14
|
|
15
|
|
16
|
Bistarelli S, Codognet P, Hui H, Lee J. Solving finite domain constraint hierarchies by local consistency and tree search. J EXP THEOR ARTIF IN 2009. [DOI: 10.1080/09528130802667690] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
17
|
Marinescu R, Dechter R. AND/OR Branch-and-Bound search for combinatorial optimization in graphical models. ARTIF INTELL 2009. [DOI: 10.1016/j.artint.2009.07.003] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
18
|
Kohlas J, Wilson N. Semiring induced valuation algebras: Exact and approximate local computation algorithms. ARTIF INTELL 2008. [DOI: 10.1016/j.artint.2008.03.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
19
|
|
20
|
|
21
|
Thébault P, de Givry S, Schiex T, Gaspin C. Searching RNA motifs and their intermolecular contacts with constraint networks. Bioinformatics 2006; 22:2074-80. [PMID: 16820426 DOI: 10.1093/bioinformatics/btl354] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Searching RNA gene occurrences in genomic sequences is a task whose importance has been renewed by the recent discovery of numerous functional RNA, often interacting with other ligands. Even if several programs exist for RNA motif search, none exists that can represent and solve the problem of searching for occurrences of RNA motifs in interaction with other molecules. RESULTS We present a constraint network formulation of this problem. RNA are represented as structured motifs that can occur on more than one sequence and which are related together by possible hybridization. The implemented tool MilPat is used to search for several sRNA families in genomic sequences. Results show that MilPat allows to efficiently search for interacting motifs in large genomic sequences and offers a simple and extensible framework to solve such problems. New and known sRNA are identified as H/ACA candidates in Methanocaldococcus jannaschii. AVAILABILITY http://carlit.toulouse.inra.fr/MilPaT/MilPat.pl.
Collapse
Affiliation(s)
- P Thébault
- Unité de Biométrie & Intelligence Artificielle, INRA, Chemin de Borde Rouge Auzeville, BP 52627, 31326 Castanet-Tolosan, France
| | | | | | | |
Collapse
|
22
|
|
23
|
|