Information Gain in Decision Trees

Information Gain In Decision Trees

In information theory and machine learning, information gain is an alternative synonym for Kullback–Leibler divergence.

In particular, the information gain about a random variable X obtained from an observation that a random variable A takes the value A=a is the Kullback-Leibler divergence DKL(p(x | a) || p(x | I)) of the prior distribution p(x | I) for x from the posterior distribution p(x | a) for x given a.

The expected value of the information gain is the mutual information I(X; A) of X and A — i.e. the reduction in the entropy of X achieved by learning the state of the random variable A.

In machine learning this concept can be used to define a preferred sequence of attributes to investigate to most rapidly narrow down the state of X. Such a sequence (which depends on the outcome of the investigation of previous attributes at each stage) is called a decision tree. Usually an attribute with high information gain should be preferred to other attributes.

Read more about Information Gain In Decision Trees:  General Definition, Formal Definition, Drawbacks, Constructing A Decision Tree Using Information Gain

Famous quotes containing the words information, gain, decision and/or trees:

    I was brought up to believe that the only thing worth doing was to add to the sum of accurate information in the world.
    Margaret Mead (1901–1978)

    She is watching her country lose its evoked master shape watching
    it lose
    And gain get back its houses and peoples watching it bring up
    Its local lights single homes lamps on barn roofs
    James Dickey (b. 1923)

    A good decision is based on knowledge and not on numbers.
    Plato (c. 427–347 B.C.)

    Plants are the young of the world, vessels of health and vigor; but they grope ever upwards towards consciousness; the trees are imperfect men, and seem to bemoan their imprisonment, rooted in the ground.
    Ralph Waldo Emerson (1803–1882)