Entropy (information Theory)

Introduction

Entropy is a measure of unpredictability or information content. To get an informal, intuitive understanding of the connection between these three English terms, consider the example of a poll on some political issue. Usually, such polls happen because the outcome of the poll isn't already known. In other words, the outcome of the poll is relatively unpredictable, and actually performing the poll and learning the results gives some new information; these are just different ways of saying that the entropy of the poll results is large. Now consider the case that the same poll is performed a second time shortly after the first poll. Since the result of the first poll is already known, the outcome of the second poll can be predicted well and the results should not contain any new information; in this case the entropy of the second poll results is small.

Now consider the example of a coin toss. When the coin is fair, that is, when the probability of heads is the same as the probability of tails, then the entropy of the coin toss is as high as it could be. This is because there is no way to predict the outcome of the coin toss ahead of time - the best we can do is predict that the coin will come up heads, and our prediction will be correct with probability 1/2. Such a coin toss has one bit of entropy since there are two possible outcomes that occur with equal probability, and learning the actual outcome contains one bit of information. Contrarily, a coin toss with a coin that has two heads and no tails has zero entropy since the coin will always come up heads, and the outcome can be predicted perfectly. Most collections of data in the real world lie somewhere in between.

English text has fairly low entropy. In other words, it is fairly predictable. Even if we don't know exactly what is going to come next, we can be fairly certain that, for example, there will be many more e's than z's, or that the combination 'qu' will be much more common than any other combination with a 'q' in it and the combination 'th' will be more common than 'z', 'q', or 'qu'. Uncompressed, English text has about one bit of entropy for each character (commonly encoded as eight bits) of message.

If a compression scheme is lossless—that is, you can always recover the entire original message by uncompressing—then a compressed message has the same quantity of information as the original, but communicated in fewer bits. That is, it has more information per bit, or a higher entropy. This means a compressed message is more unpredictable, because there is no redundancy; each bit of the message is communicating a unique bit of information. Roughly speaking, Shannon's source coding theorem says that a lossless compression scheme cannot compress messages, on average, to have more than one bit of information per bit of message. The entropy of a message multiplied by the length of that message is a measure of how much information the message contains.

Shannon's theorem also implies that no lossless compression scheme can compress all messages. If some messages come out smaller, at least one must come out larger. In practical use, this is not a problem, because we are generally only interested in compressing certain types of messages, for example English documents as opposed to jibberish text, or digital photographs rather than noise, and it is unimportant if our compression algorithm makes certain kinds of random sequences larger.

Read more about this topic: Entropy (information Theory)

Famous quotes containing the word introduction:

“Such is oftenest the young man’s introduction to the forest, and the most original part of himself. He goes thither at first as a hunter and fisher, until at last, if he has the seeds of a better life in him, he distinguishes his proper objects, as a poet or naturalist it may be, and leaves the gun and fish-pole behind. The mass of men are still and always young in this respect.”
—Henry David Thoreau (1817–1862)

“My objection to Liberalism is this—that it is the introduction into the practical business of life of the highest kind—namely, politics—of philosophical ideas instead of political principles.”
—Benjamin Disraeli (1804–1881)

“The role of the stepmother is the most difficult of all, because you can’t ever just be. You’re constantly being tested—by the children, the neighbors, your husband, the relatives, old friends who knew the children’s parents in their first marriage, and by yourself.”
—Anonymous Stepparent. Making It as a Stepparent, by Claire Berman, introduction (1980, repr. 1986)

Entropy (information Theory) - Introduction

Famous quotes containing the word introduction: