Abstract
A growing number of solved protein structures display an elongated structural
domain, denoted here as alpha-rod, composed of stacked pairs of anti-parallel
alpha-helices. Alpha-rods are flexible and expose a large surface, which makes
them suitable for protein interaction. Although most likely originating by
tandem duplication of a two-helix unit, their detection using sequence
similarity between repeats is poor. Here, we show that alpha-rod repeats can be
detected using a neural network. The network detects more repeats than are
identified by domain databases using multiple profiles, with a low level of
false positives (<10%). We identify alpha-rod repeats in
approximately 0.4% of proteins in eukaryotic genomes. We then
investigate the results for all human proteins, identifying alpha-rod repeats
for the first time in six protein families, including proteins STAG1-3, SERAC1,
and PSMD1-2 & 5. We also characterize a short version of these repeats
in eight protein families of Archaeal, Bacterial, and Fungal species. Finally,
we demonstrate the utility of these predictions in directing experimental work
to demarcate three alpha-rods in huntingtin, a protein mutated in
Huntington's disease. Using yeast two hybrid analysis and an
immunoprecipitation technique, we show that the huntingtin fragments containing
alpha-rods associate with each other. This is the first definition of domains in
huntingtin and the first validation of predicted interactions between fragments
of huntingtin, which sets up directions toward functional characterization of
this protein. An implementation of the repeat detection algorithm is available
as a Web server with a simple graphical output: http://www.ogic.ca/projects/ard. This can be further visualized
using BiasViz, a graphic tool for representation of multiple sequence
alignments.
Many proteins have an elongated structural domain formed by a stack of alpha
helices (alpha-rod), often found to interact with other proteins. The
identification of an alpha-rod in a protein can therefore tell something about
both the function and the structure of that protein. Though alpha-rods can be
readily identified from the structure of proteins, for the vast majority of
known proteins this is unavailable, and we have to use their amino acid
sequence. Because alpha-rods have highly variable sequences, commonly used
methods of domain identification by sequence similarity have difficulty
detecting them. However, alpha-rods do have specific patterns of amino acid
properties along their sequences, so we used a computational method based on a
neural network to learn these patterns. We illustrate how this method finds
novel instances of the domain in proteins from a wide range of organisms. We
performed detailed analysis of huntingtin, the protein mutated in
Huntington's chorea, a neurodegenerative disease. The function of
huntingtin remains a mystery partially due to the lack of knowledge about its
structure. Therefore, we defined three alpha-rods in this protein and
experimentally verified how they interact with each other, a novel result that
opens new avenues for huntingtin research.
Collapse