Mutual Information - Relation To Other Quantities

Relation To Other Quantities

Mutual information can be equivalently expressed as

$\begin{align} I(X;Y) & {} = H(X) - H(X|Y) \\ & {} = H(Y) - H(Y|X) \\ & {} = H(X) + H(Y) - H(X,Y) \\ & {} = H(X,Y) - H(X|Y) - H(Y|X) \end{align}$

where H(X) and H(Y) are the marginal entropies, H(X|Y) and H(Y|X) are the conditional entropies, and H(X,Y) is the joint entropy of X and Y. Using the Jensen's inequality on the definition of mutual information, we can show that I(X;Y) is non-negative; so consequently, H(X) ≥ H(X|Y).

Intuitively, if entropy H(X) is regarded as a measure of uncertainty about a random variable, then H(X|Y) is a measure of what Y does not say about X. This is "the amount of uncertainty remaining about X after Y is known", and thus the right side of the first of these equalities can be read as "the amount of uncertainty in X, minus the amount of uncertainty in X which remains after Y is known", which is equivalent to "the amount of uncertainty in X which is removed by knowing Y". This corroborates the intuitive meaning of mutual information as the amount of information (that is, reduction in uncertainty) that knowing either variable provides about the other.

Note that in the discrete case H(X|X) = 0 and therefore H(X) = I(X;X). Thus I(X;X) ≥ I(X;Y), and one can formulate the basic principle that a variable contains at least as much information about itself as any other variable can provide.

Mutual information can also be expressed as a Kullback-Leibler divergence, of the product p(x) × p(y) of the marginal distributions of the two random variables X and Y, from p(x,y) the random variables' joint distribution:

Furthermore, let p(x|y) = p(x, y) / p(y). Then

$\begin{align} I(X;Y) & {} = \sum_y p(y) \sum_x p(x|y) \log_2 \frac{p(x|y)}{p(x)} \\ & {} = \sum_y p(y) \; D_{\mathrm{KL}}(p(x|y)\|p(x)) \\ & {} = \mathbb{E}_Y\{D_{\mathrm{KL}}(p(x|y)\|p(x))\}. \end{align}$

Thus mutual information can also be understood as the expectation of the Kullback-Leibler divergence of the univariate distribution p(x) of X from the conditional distribution p(x|y) of X given Y: the more different the distributions p(x|y) and p(x), the greater the information gain.

Read more about this topic: Mutual Information

Famous quotes containing the words relation to, relation and/or quantities:

“Much poetry seems to be aware of its situation in time and of its relation to the metronome, the clock, and the calendar. ... The season or month is there to be felt; the day is there to be seized. Poems beginning “When” are much more numerous than those beginning “Where” of “If.” As the meter is running, the recurrent message tapped out by the passing of measured time is mortality.”
—William Harmon (b. 1938)

“Whoever has a keen eye for profits, is blind in relation to his craft.”
—Sophocles (497–406/5 B.C.)

“Compilers resemble gluttonous eaters who devour excessive quantities of healthy food just to excrete them as refuse.”
—Franz Grillparzer (1791–1872)

Related Subjects

Quantum Mutual Information

Related Phrases

Image Registration

Image Similarity

Information Theory

Quantities of Information

Related Words