Conditional Entropy - Chain Rule

Chain Rule

Assume that the combined system determined by two random variables X and Y has entropy, that is, we need bits of information to describe its exact state. Now if we first learn the value of, we have gained bits of information. Once is known, we only need bits to describe the state of the whole system. This quantity is exactly, which gives the chain rule of conditional probability:

Formally, the chain rule indeed follows from the above definition of conditional probability:

\begin{align}
H(Y|X)=&\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log \frac {p(x)} {p(x,y)}\\ =&-\sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(x,y) + \sum_{x\in\mathcal X, y\in\mathcal Y}p(x,y)\log\,p(x) \\
=& H(X,Y) + \sum_{x \in \mathcal X} p(x)\log\,p(x) \\
=& H(X,Y) - H(X).
\end{align}

Read more about this topic:  Conditional Entropy

Famous quotes containing the words chain and/or rule:

    The name of the town isn’t important. It’s the one that’s just twenty-eight minutes from the big city. Twenty-three if you catch the morning express. It’s on a river and it’s got houses and stores and churches. And a main street. Nothing fancy like Broadway or Market, just plain Broadway. Drug, dry good, shoes. Those horrible little chain stores that breed like rabbits.
    Joseph L. Mankiewicz (1909–1993)

    The first rule of education for me was discipline. Discipline is the keynote to learning. Discipline has been the great factor in my life. I discipline myself to do everything—getting up in the morning, walking, dancing, exercise. If you won’t have discipline, you won’t have a nation. We can’t have permissiveness. When someone comes in and says, “Oh, your room is so quiet,” I know I’ve been successful.
    Rose Hoffman, U.S. public school third-grade teacher. As quoted in Working, book 8, by Studs Terkel (1973)