Use of The Data Set
Based on Fisher's linear discriminant model, this data set became a typical test case for many classification techniques in machine learning such as support vector machines.
The use of this data set in cluster analysis however is uncommon, since the data set only contains two clusters with rather obvious separation. One of the clusters contains Iris setosa, while the other cluster contains both Iris virginica and Iris versicolor and is not separable without the species information Fisher used. This makes the data set a good example to explain the difference between supervised and unsupervised techniques in data mining: Fisher's linear discriminant model can only be obtained when the object species are known: class labels and clusters are not necessarily the same.
Nevertheless, all three species of Iris are separable in the projection on the nonlinear branching principal component The data set is approximated by the closest tree with some penalty for the excessive number of nodes, bending and stretching. Then the so-called "metro map" is constructed. The data points are projected into the closest node. For each node the pie diagram of the projected points is prepared. The area of the pie is proportional to the number of the projected points. It is clear from the Fig (left) that the absolute majority of the samples of the different Iris species belong to the different nodes. Only a small fraction of Iris-virginica is mixed with Iris-versicolor (the mixed blue-grin nodes in the Fig.). Therefore, the three species of Iris (Iris setosa, Iris virginica and Iris versicolor) are separable by the unsupervising procedures of nonlinear principal component analysis. To discriminate them, it is sufficient just to select the corresponding nodes on the principal tree.
Read more about this topic: Iris Flower Data Set
Famous quotes containing the words data and/or set:
“This city is neither a jungle nor the moon.... In long shot: a cosmic smudge, a conglomerate of bleeding energies. Close up, it is a fairly legible printed circuit, a transistorized labyrinth of beastly tracks, a data bank for asthmatic voice-prints.”
—Susan Sontag (b. 1933)
“Really, if the lower orders dont set us a good example, what on earth is the use of them? They seem, as a class, to have absolutely no sense of moral responsibility.”
—Oscar Wilde (18541900)