Conditional Entropy - Chain Rule

Chain Rule

Assume that the combined system determined by two random variables X and Y has entropy, that is, we need bits of information to describe its exact state. Now if we first learn the value of, we have gained bits of information. Once is known, we only need bits to describe the state of the whole system. This quantity is exactly, which gives the chain rule of conditional probability:

Formally, the chain rule indeed follows from the above definition of conditional probability:

$\begin{align} H(Y|X)=&\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}\\ =&-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(x,y) + \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(x) \\ =& H(X,Y) + \sum_{x \in \mathcal X} p(x)\log\,p(x) \\ =& H(X,Y) - H(X). \end{align}$

Read more about this topic: Conditional Entropy

Famous quotes containing the words chain and/or rule:

“Man ... cannot learn to forget, but hangs on the past: however far or fast he runs, that chain runs with him.”
—Friedrich Nietzsche (1844–1900)

“Rules and particular inferences alike are justified by being brought into agreement with each other. A rule is amended if it yields an inference we are unwilling to accept; an inference is rejected if it violates a rule we are unwilling to amend. The process of justification is the delicate one of making mutual adjustments between rules and accepted inferences; and in the agreement achieved lies the only justification needed for either.”
—Nelson Goodman (b. 1906)

Related Phrases

Information Theory

Quantities of Information

Related Words