Mutual Information - Relation To Other Quantities

Relation To Other Quantities

Mutual information can be equivalently expressed as


\begin{align}
I(X;Y) & {} = H(X) - H(X|Y) \\
& {} = H(Y) - H(Y|X) \\
& {} = H(X) + H(Y) - H(X,Y) \\
& {} = H(X,Y) - H(X|Y) - H(Y|X)
\end{align}

where H(X) and H(Y) are the marginal entropies, H(X|Y) and H(Y|X) are the conditional entropies, and H(X,Y) is the joint entropy of X and Y. Using the Jensen's inequality on the definition of mutual information, we can show that I(X;Y) is non-negative; so consequently, H(X) ≥ H(X|Y).

Intuitively, if entropy H(X) is regarded as a measure of uncertainty about a random variable, then H(X|Y) is a measure of what Y does not say about X. This is "the amount of uncertainty remaining about X after Y is known", and thus the right side of the first of these equalities can be read as "the amount of uncertainty in X, minus the amount of uncertainty in X which remains after Y is known", which is equivalent to "the amount of uncertainty in X which is removed by knowing Y". This corroborates the intuitive meaning of mutual information as the amount of information (that is, reduction in uncertainty) that knowing either variable provides about the other.

Note that in the discrete case H(X|X) = 0 and therefore H(X) = I(X;X). Thus I(X;X) ≥ I(X;Y), and one can formulate the basic principle that a variable contains at least as much information about itself as any other variable can provide.

Mutual information can also be expressed as a Kullback-Leibler divergence, of the product p(x) × p(y) of the marginal distributions of the two random variables X and Y, from p(x,y) the random variables' joint distribution:

Furthermore, let p(x|y) = p(x, y) / p(y). Then


\begin{align}
I(X;Y) & {} = \sum_y p(y) \sum_x p(x|y) \log_2 \frac{p(x|y)}{p(x)} \\
& {} = \sum_y p(y) \; D_{\mathrm{KL}}(p(x|y)\|p(x)) \\
& {} = \mathbb{E}_Y\{D_{\mathrm{KL}}(p(x|y)\|p(x))\}.
\end{align}

Thus mutual information can also be understood as the expectation of the Kullback-Leibler divergence of the univariate distribution p(x) of X from the conditional distribution p(x|y) of X given Y: the more different the distributions p(x|y) and p(x), the greater the information gain.

Read more about this topic:  Mutual Information

Famous quotes containing the words relation to, relation and/or quantities:

    The whole point of Camp is to dethrone the serious. Camp is playful, anti-serious. More precisely, Camp involves a new, more complex relation to “the serious.” One can be serious about the frivolous, frivolous about the serious.
    Susan Sontag (b. 1933)

    We must get back into relation, vivid and nourishing relation to the cosmos and the universe. The way is through daily ritual, and is an affair of the individual and the household, a ritual of dawn and noon and sunset, the ritual of the kindling fire and pouring water, the ritual of the first breath, and the last.
    —D.H. (David Herbert)

    James Brown and Frank Sinatra are two different quantities in the universe. They represent two different experiences of the world.
    Imamu Amiri Baraka [Everett Leroi Jones] (b. 1934)