Transduction (machine Learning) - Example Problem

Example Problem

The following example problem contrasts some of the unique properties of transduction against induction.

A collection of points is given, such that some of the points are labeled (A, B, or C), but most of the points are unlabeled (?). The goal is to predict appropriate labels for all of the unlabeled points.

The inductive approach to solving this problem is to use the labeled points to train a supervised learning algorithm, and then have it predict labels for all of the unlabeled points. With this problem, however, the supervised learning algorithm will only have five labeled points to use as a basis for building a predictive model. It will certainly struggle to build a model that captures the structure of this data. For example, if a nearest-neighbor algorithm is used, then the points near the middle will be labeled "A" or "C", even though it is apparent that they belong to the same cluster as the point labeled "B".

Transduction has the advantage of being able to consider all of the points, not just the labeled points, while performing the labeling task. In this case, transductive algorithms would label the unlabeled points according to the clusters to which they naturally belong. The points in the middle, therefore, would most likely be labeled "B", because they are packed very close to that cluster.

An advantage of transduction is that it may be able to make better predictions with fewer labeled points, because it uses the natural breaks found in the unlabeled points. One disadvantage of transduction is that it builds no predictive model. If a previously unknown point is added to the set, the entire transductive algorithm would need to be repeated with all of the points in order to predict a label. This can be computationally expensive if the data is made available incrementally in a stream. Further, this might cause the predictions of some of the old points to change (which may be good or bad, depending on the application). A supervised learning algorithm, on the other hand, can label new points instantly, with very little computational cost.

Read more about this topic:  Transduction (machine Learning)

Famous quotes containing the word problem:

    Give a scientist a problem and he will probably provide a solution; historians and sociologists, by contrast, can offer only opinions. Ask a dozen chemists the composition of an organic compound such as methane, and within a short time all twelve will have come up with the same solution of CH4. Ask, however, a dozen economists or sociologists to provide policies to reduce unemployment or the level of crime and twelve widely differing opinions are likely to be offered.
    Derek Gjertsen, British scientist, author. Science and Philosophy: Past and Present, ch. 3, Penguin (1989)

    The family environment in which your children are growing up is different from that in which you grew up. The decisions our parents made and the strategies they used were developed in a different context from what we face today, even if the “content” of the problem is the same. It is a mistake to think that our own experience as children and adolescents will give us all we need to help our children. The rules of the game have changed.
    Lawrence Kutner (20th century)