INSERM EBGM U436

 Prediction of local structures from the sequence

anim 08


Here, we present a simple example of local structure prediction :

    1- The sequence fragment is "GNPSTVQNPRLMNP".

    2- The normalized occurence matrix associated with PB x is taken. It is defined by fij the frequence of amino acid i in position j divided by the frequence of amino acid i in the databank (i is in the range 1 to 20 and j in the range -7 to +7, the sequence window is always 15 residue length).

Protein Block matrix

    3- The computation is the product of fij, the value taken into account are in pink.

Protein Block matrix and sequence

    4- The computation is performed for the 16 Protein Blocks. The 16 scores are then normalized (sum equal 1). Then, the Neq or Number of EQuivalent is computed. It is a confidence index of the prediction.

Protein Blocks Score Normalized Scores
a 1.93 0.23
b 0.12 0.01
c 0.23 0.03
d 0.02 0.00
e 0.14 0.02
f 0.25 0.03
g 0.56 0.07
h 0.56 0.07
i 0.89 0.11
j 0.63 0.08
k 0.45 0.05
l 0.12 0.01
m 0.05 0.01
n 0.08 0.01
o 0.97 0.12
p 1.23 0.15
sum = 8.23 1.00

    5- The 16 scores are re-arranged : a, p, o, i, j, g, h, k, f, c, e, b, l, n, m, d. Here the most probable PB is the PB a.

    6- If the Neq is small (close to 1), the PB a has a good probabilty to be the real good. On the contrary, if the Neq is high, the sequence use are not informative, so it is more interessting to select not only the most probable PB, but a set of probable PBs like PBs a, p, o, i and  j. The Neq value and the desired prediction rate would tell you the exact number of PB to be select.

    7- The prediction rate is computed as the sum of well predicted BPs (true PBs found the most probable / total number of PBs).

back
Last modif : 25 April 2004