2 Base Encoding - 2 Base Encoding Considerations

2 Base Encoding Considerations

In practice direct translation of color reads into base reads is not advised as the moment one encounters an error in the color calls it will result in a frameshift of the base calls. To best leverage the "error correction" properties of two base encoding it is best to convert your base reference sequence into color-space. There is one unambiguous conversion of a base reference sequence into color-space, but there are four possible conversions of a color string into base strings. Think of amino acid translation. There is one unambiguous translation of bases into amino acids but there are many solutions to the translation of amino acids into bases.

Mapping color-space reads to a color-space reference can properly utilize the two-base encoding rules where only adjacent color differences can represent a true base polymorphism. Direct decoding or translation of the color reads into bases cannot do this efficiently without other knowledge.

More specifically, this method is not an error correction tool but an error transformation tool. Color-space transforms your most common error mode (single measurement errors) into a different frequency than your most common form of DNA variation (SNPs or single base changes). These single base changes affect adjacent colors in color space. There are logical rules which help correct adjacent errors into 'valid' and 'invalid' adjacent errors.

The likelihood of getting two adjacent errors in a 50-bp read can be estimated. There are 49 ways of making adjacent changes to a 50 letter string (50-bp read). There are 1225 ways of making non-adjacent changes to a 50 letter string (50 choose 2). Simplistically, if one assumes errors are completely random (they are usually higher frequency at the end of reads) only 49 out of 1225 errors will be candidates for SNPs. In addition, only one third of the adjacent errors can be valid errors according to the known labeling of the probes thus delivering only 16 out of 1225 errors which can be candidates for SNPs. This is particularly useful for low coverage SNP detection as it reduces false positives at low coverage, Smith et al.

Read more about this topic:  2 Base Encoding

Famous quotes containing the word base:

    They must to keep their certainty accuse
    All that are different of a base intent;
    Pull down established honour; hawk for news
    Whatever their loose phantasy invent
    And murmur it with bated breath....
    William Butler Yeats (1865–1939)