Index of Coincidence - Example

Example

As a practical illustration of the use of I.C., suppose that we have intercepted the following ciphertext message:

QPWKA LVRXC QZIKG RBPFA EOMFL JMSDZ VDHXC XJYEB IMTRQ WNMEA IZRVK CVKVL XNEIC FZPZC ZZHKM LVZVZ IZRRQ WDKEC HOSNY XXLSP MYKVQ XJTDC IOMEE XDQVS RXLRL KZHOV

(The grouping into five characters is just a telegraphic convention and has nothing to do with actual word lengths.) Suspecting this to be an English plaintext encrypted using a Vigenère cipher with normal A–Z components and a short repeating keyword, we can consider the ciphertext "stacked" into some number of columns, for example seven:

QPWKALV RXCQZIK GRBPFAE OMFLJMS DZVDHXC XJYEBIM TRQWN…

If the key size happens to have been the same as the assumed number of columns, then all the letters within a single column will have been enciphered using the same key letter, in effect a simple Caesar cipher applied to a random selection of English plaintext characters. The corresponding set of ciphertext letters should have a roughness of frequency distribution similar to that of English, although the letter identities have been permuted (shifted by a constant amount corresponding to the key letter). Therefore if we compute the aggregate delta I.C. for all columns ("delta bar"), it should be around 1.73. On the other hand, if we have incorrectly guessed the key size (number of columns), the aggregate delta I.C. should be around 1.00. So we compute the delta I.C. for assumed key sizes from one to ten:

Size Delta-bar I.C.
1 1.12
2 1.19
3 1.05
4 1.17
5 1.82
6 0.99
7 1.00
8 1.05
9 1.16
10 2.07

We see that the key size is most likely five. If the actual size is five, we would expect a width of ten to also report a high I.C., since each of its columns also corresponds to a simple Caesar encipherment, and we confirm this. So we should stack the ciphertext into five columns:

QPWKA LVRXC QZIKG RBPFA EOMFL JMSDZ VDH…

We can now try to determine the most likely key letter for each column considered separately, by performing trial Caesar decryption of the entire column for each of the 26 possibilities A–Z for the key letter, and choosing the key letter that produces the highest correlation between the decrypted column letter frequencies and the relative letter frequencies for normal English text. That correlation, which we don't need to worry about normalizing, can be readily computed as

where are the observed column letter frequencies and are the relative letter frequencies for English. When we try this, the best-fit key letters are reported to be "EVERY," which we recognize as an actual word, and using that for Vigenère decryption produces the plaintext:

MUSTC HANGE MEETI NGLOC ATION FROMB RIDGE TOUND ERPAS SSINC EENEM YAGEN TSARE BELIE VEDTO HAVEB EENAS SIGNE DTOWA TCHBR IDGES TOPME ETING TIMEU NCHAN GEDXX

from which one obtains:

MUST CHANGE MEETING LOCATION FROM BRIDGE TO UNDERPASS SINCE ENEMY AGENTS ARE BELIEVED TO HAVE BEEN ASSIGNED TO WATCH BRIDGE STOP MEETING TIME UNCHANGED XX

after word divisions have been restored at the obvious positions. "XX" are evidently "null" characters used to pad out the final group for transmission.

This entire procedure could easily be packaged into an automated algorithm for breaking such ciphers. Due to normal statistical fluctuation, such an algorithm will occasionally make wrong choices, especially when analyzing short ciphertext messages.

Read more about this topic:  Index Of Coincidence

Famous quotes containing the word example:

    Our intellect is not the most subtle, the most powerful, the most appropriate, instrument for revealing the truth. It is life that, little by little, example by example, permits us to see that what is most important to our heart, or to our mind, is learned not by reasoning but through other agencies. Then it is that the intellect, observing their superiority, abdicates its control to them upon reasoned grounds and agrees to become their collaborator and lackey.
    Marcel Proust (1871–1922)