Perovic V, Leclercq JY, Sumonja N, Richard FD, Veljkovic N, Kajava AV. Tally-2.0: upgraded validator of tandem repeat detection in protein sequences.
Bioinformatics 2020;
36:3260-3262. [PMID:
32096820 DOI:
10.1093/bioinformatics/btaa121]
[Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 02/02/2020] [Accepted: 02/18/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION
Proteins containing tandem repeats (TRs) are abundant, frequently fold in elongated non-globular structures and perform vital functions. A number of computational tools have been developed to detect TRs in protein sequences. A blurred boundary between imperfect TR motifs and non-repetitive sequences gave rise to necessity to validate the detected TRs.
RESULTS
Tally-2.0 is a scoring tool based on a machine learning (ML) approach, which allows to validate the results of TR detection. It was upgraded by using improved training datasets and additional ML features. Tally-2.0 performs at a level of 93% sensitivity, 83% specificity and an area under the receiver operating characteristic curve of 95%.
AVAILABILITY AND IMPLEMENTATION
Tally-2.0 is available, as a web tool and as a standalone application published under Apache License 2.0, on the URL https://bioinfo.crbm.cnrs.fr/index.php? route=tools&tool=27. It is supported on Linux. Source code is available upon request.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Collapse