Pseudo Amino Acid Composition - Algorithm

Algorithm

According to the PseAA composition model, the protein P of Eq.1 can be formulated as

where the components are given by

  •  p_u = \begin{cases}
\dfrac {f_u} {\sum_{i=1}^{20}f_i \, + \, w\sum_{k=1}^{\lambda} \tau_k}, & (1 \le u \le 20)
\\
\dfrac {w \tau_{u-20}} {\sum_{i=1}^{20} f_i \, + \, w\sum_{k=1}^{\lambda} \tau_k}, & (20+1 \le u \le 20+\lambda)
\end{cases}
\qquad \text{(4)}

where is the weight factor, and the -th tier correlation factor that reflects the sequence order correlation between all the -th most contiguous residues as formulated by

  • 
\tau_k = \frac {1}{L-k} \sum_{i=1}^{L-k} \, \mathrm{J}_{i, i+k}, \,\,\, (k < L)
\qquad \text{(5)}

with

  • 
\mathrm{J}_{i, i+k} = \frac{1}{\Gamma} \sum_{q=1}^{\Gamma} \left^2
\qquad \text{(6)}

where is the -th function of the amino acid, and the total number of the functions considered. For example, in the original paper by Chou, and are respectively the hydrophobicity value, hydrophilicity value, and side chain mass of amino acid ; while, and the corresponding values for the amino acid . Therefore, the total number of functions considered there is . It can be seen from Eq.3 that the first 20 components, i.e. are associated with the conventional AA composition of protein, while the remaining components are the correlation factors that reflect the 1st tier, 2nd tier, …, and the -th tier sequence order correlation patterns. It is through these additional factors that some important sequence-order effects are incorporated.

in Eq.3 is a parameter of integer and that choosing a different integer for will lead to a dimension-different PseAA composition.

Using Eq.6 is just one of the modes for deriving the correlation factors or PseAA components. The others, such as the physicochemical distance mode and amphiphilic pattern mode, can also be used to derive different types of PseAA composition, as summarized in a review paper.

Read more about this topic:  Pseudo Amino Acid Composition