Basic Procedure
- Formulate the problem - select the variables to which you wish to apply the clustering technique
- Select a distance measure - various ways of computing distance:
- Squared Euclidean distance - the sum of the squared differences in value for each variable
- Manhattan distance - the sum of the absolute differences in value for any variable
- Chebyshev distance - the maximum absolute difference in values for any variable
- Mahalanobis (or correlation) distance - this measure uses the correlation coefficients between the observations and uses that as a measure to cluster them. This is an important measure since it is unit invariant (can figuratively compare apples to oranges)
- Select a clustering procedure (see below)
- Decide on the number of clusters
- Map and interpret clusters - draw conclusions - illustrative techniques like perceptual maps, icicle plots, and dendrograms are useful
- Assess reliability and validity - various methods:
- repeat analysis but use different distance measure
- repeat analysis but use different clustering technique
- split the data randomly into two halves and analyze each part separately
- repeat analysis several times, deleting one variable each time
- repeat analysis several times, using a different order each time
Read more about this topic: Cluster Analysis (in Marketing)
Famous quotes containing the word basic:
“The universal moments of child rearing are in fact nothing less than a confrontation with the most basic problems of living in society: a facing through ones children of all the conflicts inherent in human relationships, a clarification of issues that were unresolved in ones own growing up. The experience of child rearing not only can strengthen one as an individual but also presents the opportunity to shape human relationships of the future.”
—Elaine Heffner (20th century)