A Statistical Similarity/Dissimilarity Analysis of Protein Sequences Based on a Novel Group Representative Vector.
BIOMED RESEARCH INTERNATIONAL 2019;
2019:8702968. [PMID:
31205946 PMCID:
PMC6530227 DOI:
10.1155/2019/8702968]
[Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/06/2019] [Revised: 03/13/2019] [Accepted: 04/07/2019] [Indexed: 11/30/2022]
Abstract
Similarity/dissimilarity analysis is a key way of understanding the biology of an organism by knowing the origin of the new genes/sequences. Sequence data are grouped in terms of biological relationships. The number of sequences related to any group is susceptible to be increased every day. All the present alignment-free methods approve the utility of their approaches by producing a similarity/dissimilarity matrix. Although this matrix is clear, it measures the degree of similarity among sequences individually. In our work, a representative of each of three groups of protein sequences is introduced. A similarity/dissimilarity vector is evaluated instead of the ordinary similarity/dissimilarity matrix based on the group representative. The approach is applied on three selected groups of protein sequences: beta globin, NADH dehydrogenase subunit 5 (ND5), and spike protein sequences. A cross-grouping comparison is produced to ensure the singularity of each group. A qualitative comparison between our approach, previous articles, and the phylogenetic tree of these protein sequences proved the utility of our approach.
Collapse