Kernel Density Estimation - Definition

Definition

Let (x1, x2, …, xn) be an iid sample drawn from some distribution with an unknown density ƒ. We are interested in estimating the shape of this function ƒ. Its kernel density estimator is

 \hat{f}_h(x) = \frac{1}{n}\sum_{i=1}^n K_h (x - x_i) \quad = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big),

where K(•) is the kernel — a symmetric but not necessarily positive function that integrates to one — and h > 0 is a smoothing parameter called the bandwidth. A kernel with subscript h is called the scaled kernel and defined as Kh(x) = 1/h K(x/h). Intuitively one wants to choose h as small as the data allow, however there is always a trade-off between the bias of the estimator and its variance; more on the choice of bandwidth later.

A range of kernel functions are commonly used: uniform, triangular, biweight, triweight, Epanechnikov, normal, and others. The Epanechnikov kernel is optimal in a minimum variance sense, though the loss of efficiency is small for the kernels listed previously, and due to its convenient mathematical properties, the normal kernel is often used K(x) = ϕ(x), where ϕ is the standard normal density function.

The construction of a kernel density estimate finds interpretations in fields outside of density estimation. For example, in thermodynamics, this is equivalent to the amount of heat generated when heat kernels (the fundamental solution to the heat equation) are placed at each data point locations xi. Similar methods are used to construct discrete Laplace operators on point clouds for manifold learning.

Read more about this topic:  Kernel Density Estimation

Famous quotes containing the word definition:

    Beauty, like all other qualities presented to human experience, is relative; and the definition of it becomes unmeaning and useless in proportion to its abstractness. To define beauty not in the most abstract, but in the most concrete terms possible, not to find a universal formula for it, but the formula which expresses most adequately this or that special manifestation of it, is the aim of the true student of aesthetics.
    Walter Pater (1839–1894)

    The definition of good prose is proper words in their proper places; of good verse, the most proper words in their proper places. The propriety is in either case relative. The words in prose ought to express the intended meaning, and no more; if they attract attention to themselves, it is, in general, a fault.
    Samuel Taylor Coleridge (1772–1834)

    The very definition of the real becomes: that of which it is possible to give an equivalent reproduction.... The real is not only what can be reproduced, but that which is always already reproduced. The hyperreal.
    Jean Baudrillard (b. 1929)