Construction Issues
The above example is a simple one. There are only 7 choices of bucket boundaries. One could compute the cumulative variance for all 7 options easily and choose the absolute best placement. However, as the range of values gets larger and the number of buckets gets larger, the set of possible histograms grows exponentially and it becomes a dauntingly complex problem to find the set of boundaries that provide the absolute minimum variance. A solution is to give up on finding the absolute best solution and attempt to find a good solution instead. By creating random solutions, using those as a starting point and improving upon them, one can find a solution that is a fair approximation of the "best" solution. One construction method used to get around this problem is the Iterative Improvement algorithm. Another is Simulated Annealing. The two may be combined in Two Phase Optimization, or 2PO. These algorithms are put forth in "Randomized Algorithms..." (cited below) as a method to optimize queries, but the general idea may be applied to construction of V-optimal Histograms.
Read more about this topic: V-optimal Histograms
Famous quotes containing the words construction and/or issues:
“Theres no art
To find the minds construction in the face.”
—William Shakespeare (15641616)
“The hard truth is that what may be acceptable in elite culture may not be acceptable in mass culture, that tastes which pose only innocent ethical issues as the property of a minority become corrupting when they become more established. Taste is context, and the context has changed.”
—Susan Sontag (b. 1933)